Giter Site home page Giter Site logo

jcyk / amr-gs Goto Github PK

View Code? Open in Web Editor NEW
70.0 3.0 21.0 393 KB

AMR Parsing via Graph-Sequence Iterative Inference

License: MIT License

Shell 0.99% Python 98.62% C++ 0.40%
acl2020 amr amr-parser semantic-parsing abstract-meaning-representation amr-parsing

amr-gs's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

amr-gs's Issues

.features file not found when running preprocess_raw.sh

I'm trying to parse raw sentences with a pretrained model. The file has one sentence per line as expected and I am running the Stanford CoreNLP server.

When I try to run

sh preprocess_raw.sh my_filename.txt

it successfully creates my_filename.txt.raw, as well as an empty file called my_filename.txt.raw.features.input_clean, but then throws an error:
FileNotFoundError: [Errno 2] No such file or directory: 'data/AMR/amr_2.0/my_filename.txt.raw.features'

I'm not sure where the .features file is supposed to come from - am I missing a preprocessing step?

I thought maybe I needed to run annotate_features.sh, but that seems to expect a full directory structure with train/dev/test split, and I just want to run a pretrained model on a file of test sentences.

Training with LDC2020T02 and "amrlib" project

Just FYI in case anyone's interested...

I trained the "no recategorization" branch of the model with LDC2020T02 (amr 3.0) and got a 76.7 smatch score. I didn't spend much time trying to optimize hyper-parameters and I'm using the amr 2.0 utils directory so possibly there's additional optimizations to be had, or maybe the amr 3 corpus is just more complex with the new "multi-sentence" annotations, etc..

I'm also using this model in amrlib with all of the sheng-z/stog code removed. In my version of the model code there's no pre/post processing at all. In addition, I've also switched to spaCy for annotations. I'm getting about the same 77 smatch under these conditions.

amrlib is intended as a user library for parsing and generation. I've simplified some of the parsing routines for the end-user and updated code to the latest version of penman, pytorch, sped up smatch scoring, etc.. Feel free to pull portions of revised code if you have any interest. I'd be happy to see a little more optimization of the model in that setting, though I'm not planning on focusing it myself.

The library also includes a Huggingface T5 model re-trained for graph-to-sentence generation that gets a 43 BLEU on LDC2020T02. It's a lot easier coding wise than jcyk/gtos and amazingly effective.

Question about Epoch and EarlyStoping?

Question about Epoch Setting and EarlyStoping

Hi Deng,

First congratulations on your ACL work.

I read your paper and code and then i notice that your paper mentioned about EarlyStopping but i don't find it in the code. Should I stop it by hand? Another question is about the Epoch Setting, the
epoch setting in your shell code is 100000, it's so big and the train time will very very long. Is there some extra detail i didn't noticed?

Thanks a lot.

Kind regards,
Feng Yunlong

Error when loading Bert

Hello Deng, thank you for your great work, I met some trouble when trying to use the pretrain model to parse my own text file.

I first run sh preprocess_raw.sh sample.txt , where there is only one sentence in sample.txt.

second, I change the scripts work.sh as follows:
python3 -u -m parser.work --test_data sample.txt.raw.features.preproc \ --test_batch_size 1 \ --load_path /home/lchen/AMR-gs/amr2.0.bert.gr/ckpt.pt \ --beam_size 8\ --alpha 0.6\ --max_time_step 100\ --output_suffix _test_out

and I get following messages

Traceback (most recent call last): File "/home/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/home/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/lchen/AMR-gs/parser/work.py", line 115, in <module> bert_tokenizer = BertEncoderTokenizer.from_pretrained(model_args.bert_path, do_lower_case=False) File "/home/lchen/AMR-gs/amr/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 282, in from_pretrained return cls._from_pretrained(*inputs, **kwargs) File "/home/lchen/AMR-gs/amr/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 346, in _from_pretrained list(cls.vocab_files_names.values()))) OSError: Model name '../bert-base-cased' was not found in tokenizers model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased). We assumed '../bert-base-cased' was a path or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.

Is there anything wrong about my procedure ?
Thanks a lot

Cannot load pre-trained model

Hi Deng,

thanks for the awesome work. I am interested in parsing arbitrary sentence. I took your advice with appending the sentences with dummy-AMRs to the test. The preprocessing worked fine so far, however, when I want to apply the pre-trained model to predict using work.sh (with adapted file path for pre-trained model), I get the following error:

File "/xxxxx/AMR-gs/parser/work.py", line 94, in
model_args = torch.load(fname)['args']
....
File "/zzzzzr/lib/python3.6/site-packages/torch/serialization.py", line 563, in _load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '\x00'

The path in work.sh looks as follows:

--load_path amr2.0.bert.gr \

Previously I had downloaded your pre-trained model in the main directory and used tar -xvzf to extract. Did I make any error in the extraction process? My amr2.0.bert.gr contains a file "ckpt.pt" and a directory "vocabs".

parser module naming

Just FYI in case someone else runs into this issue...

When I try to run python3 -u -m parser.extract xxx I get an error that is related to the fact that parser is a standard module in python3.

I went into this directory and changed the absolute imports to relative imports (ie.. change from parser.data import xx to from .data import xx) and renamed the top level directory from parser to model_gsii.

python3 -u -m model_gsii.extract xxx now works correctly.

Pretrained model - Missing files

Hi Deng,

I'm trying to use the pretrained model but it seems to be missing the files of "data/AMR/amr_2.0_utils/". Could you please update the repository with these files?

Thanks

Kind regards,

Carlos

Post-processing fails if sentence has <number><space><number>

Problem
The parser will output a non-breaking space character if the the input sentence contains \d+ \d+. This leads post-processing failure with error penman.DecodeError: Expected ":" or "/" at position XXX

Example .pred file

# ::id 9900
# ::snt @united iCloud it's not there yet -- PLEASE HELP 917 703 1472
# ::tokens ["@united", "iCloud", "it", "'s", "not", "there", "yet", "--", "PLEASE", "HELP", "917\u00a0703\u00a01472"]
# ::lemmas ["@united", "icloud", "it", "be", "not", "there", "yet", "--", "please", "help", "917\u00a0703\u00a01472"]
# ::pos_tags ["VBN", "NN", "PRP", "VBZ", "RB", "RB", "RB", ":", "VB", "NN", "CD"]
# ::ner_tags ["O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "NUMBER"]
# ::abstract_map {}
(c0 / multi-sentence
    :snt1 (c1 / icloud
              :mod (c3 / be-located-at
                       :ARG1 (c7 / it)
                       :ARG2 (c8 / there)
                       :time (c9 / yet)))
    :snt2 (c2 / help-01
              :ARG1 (c4 / you)
              :mode imperative
              :ARG1 (c6 / book
                        :name (c10 / 917 703 1472))))

Note that in the last line, 917 703 1472 contains non-breaking spaces.

./postprocess_2.0.sh sample.txt.pred
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/perry/anaconda3/envs/stog/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/perry/anaconda3/envs/stog/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/home/perry/PycharmProjects/phd/AMR-gs-master/stog/data/dataset_readers/amr_parsing/postprocess/postprocess.py", line 16, in postprocess2
    for amr in nr.restore_file(file_path):
  File "/home/perry/PycharmProjects/phd/AMR-gs-master/stog/data/dataset_readers/amr_parsing/postprocess/node_restore.py", line 19, in restore_file
    for amr in AMRIO.read(file_path):
  File "/home/perry/PycharmProjects/phd/AMR-gs-master/stog/data/dataset_readers/amr_parsing/io.py", line 48, in read
    amr.graph = AMRGraph.decode(' '.join(graph_lines))
  File "/home/perry/PycharmProjects/phd/AMR-gs-master/stog/data/dataset_readers/amr_parsing/amr.py", line 640, in decode
    _graph = amr_codec.decode(raw_graph_string)
  File "/home/perry/anaconda3/envs/stog/lib/python3.6/site-packages/penman.py", line 172, in decode
    span, data = self._decode_penman_node(s)
  File "/home/perry/anaconda3/envs/stog/lib/python3.6/site-packages/penman.py", line 405, in _decode_penman_node
    span, data = self._decode_penman_node(s, pos=pos)
  File "/home/perry/anaconda3/envs/stog/lib/python3.6/site-packages/penman.py", line 405, in _decode_penman_node
    span, data = self._decode_penman_node(s, pos=pos)
  File "/home/perry/anaconda3/envs/stog/lib/python3.6/site-packages/penman.py", line 405, in _decode_penman_node
    span, data = self._decode_penman_node(s, pos=pos)
  File "/home/perry/anaconda3/envs/stog/lib/python3.6/site-packages/penman.py", line 427, in _decode_penman_node
    raise DecodeError('Expected ":" or "/"', string=s, pos=pos)
penman.DecodeError: Expected ":" or "/" at position 364

Workaround
Check for non-breaking spaces and replace them with - or _ in the output.

Problem with recognizing unseen NEs

Hi Deng,

I am now able to parse arbitrary sentences with your pre-trained model, thanks to your valuable tips!

My pipeline looks as follows:

  1. annotate_features.sh directory
  2. preprocess_2.0.sh
  3. work.sh
  4. postprocess_2.0.sh ckpt.pt_test_out.pred

That writes a file ckpt.pt_test_out.pred.post, of which I assume that it is the final result.

However, the quality of arbitrary sentence parses is not so good. It does seem to struggle a bit with unseen named entities and this leads to errors. Named entities that are in the training data are perfectly recognized, but not so much new ones. Here is an example of a short sentence:

# ::id dasd
# ::snt In November 1581, Feodor's elder brother Ivan Ivanovich was killed by their father in a fit of rage.
# ::tokens ["In", "November", "1581", ",", "Feodor", "'s", "elder", "brother", "Ivan", "Ivanovich", "was", "killed", "by", "their", "father", "in", "a", "fit", "of", "rage", "."]
# ::lemmas ["in", "November", "1581", ",", "Feodor", "'s", "elder", "brother", "Ivan", "Ivanovich", "be", "kill", "by", "they", "father", "in", "a", "fit", "of", "rage", "."]
# ::pos_tags ["IN", "NNP", "CD", ",", "NNP", "POS", "JJR", "NN", "NNP", "NNP", "VBD", "VBN", "IN", "PRP$", "NN", "IN", "DT", "NN", "IN", "NN", "."]
# ::ner_tags ["O", "DATE", "DATE", "O", "PERSON", "O", "O", "O", "PERSON", "PERSON", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O"]
# ::abstract_map {}
(c0 / kill-01
      :ARG0 (c2 / person
            :ARG0-of (c7 / have-rel-role-91
                  :ARG1 (c11 / person)
                  :ARG2 (c13 / brother)))
      :ARG1 (c1 / person
            :ARG0-of (c6 / have-rel-role-91
                  :ARG0 c2
                  :ARG2 c11
                  :ARG2 (c12 / father))
            :mod (c5 / elder
                  :domain c2))
      :time (c3 / name
            :op1 "november"
            :op2 "ivan"
            :op3 "1581")
      :time (c4 / rage-02
            :ARG1 c1))

The parser has struggled with new named entities, apparently. It also contains, due to this, many other errors (e.g. two ARG2 of c6). It has also not even detected tsar "Feodor".

Here is an example of a longer sentence, this time from US press.

# ::id sda sd
# ::snt The Grand Slam at Flushing Meadows is scheduled to begin on August 31, but with New York one of the cities hardest hit by coronavirus there are doubts over whether the tournament can take place.
# ::tokens ["The", "Grand", "Slam", "at", "Flushing", "Meadows", "is", "scheduled", "to", "begin", "on", "August", "31", ",", "but", "with", "New", "York", "one", "of", "the", "cities", "hardest", "hit", "by", "coronavirus", "there", "are", "doubts", "over", "whether", "the", "tournament", "can", "take", "place", "."]
# ::lemmas ["the", "Grand", "Slam", "at", "Flushing", "Meadows", "be", "schedule", "to", "begin", "on", "August", "31", ",", "but", "with", "New", "York", "one", "of", "the", "city", "hardest", "hit", "by", "coronavirus", "there", "be", "doubt", "over", "whether", "the", "tournament", "can", "take", "place", "."]
# ::pos_tags ["DT", "NNP", "NNP", "IN", "NNP", "NNP", "VBZ", "VBN", "TO", "VB", "IN", "NNP", "CD", ",", "CC", "IN", "NNP", "NNP", "CD", "IN", "DT", "NNS", "RBS", "VBN", "IN", "NN", "EX", "VBP", "NNS", "IN", "IN", "DT", "NN", "MD", "VB", "NN", "."]
# ::ner_tags ["O", "MISC", "MISC", "O", "LOCATION", "LOCATION", "O", "O", "O", "O", "O", "DATE", "DATE", "O", "O", "O", "STATE_OR_PROVINCE", "STATE_OR_PROVINCE", "NUMBER", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O"]
# ::abstract_map {}
(c0 / have-concession-91
      :ARG1 (c1 / schedule-01
            :ARG1 (c3 / begin-01)
            :ARG3 (c4 / date-entity
                  :mod 31
                  :mod (c9 / august)))
      :ARG2 (c2 / doubt-01
            :ARG1 (c5 / possible-01
                  :ARG1 (c10 / take-01
                        :ARG1 (c12 / grand)
                        :ARG1 (c13 / manufacture-01)))
            :ARG1-of (c6 / cause-01
                  :ARG0 (c11 / include-91
                        :ARG2 (c14 / city
                              :ARG1-of (c15 / hit-01
                                    :ARG0-of (c17 / near-02
                                          :degree (c18 / most))
                                    :ARG2 (c16 / coronavirus)))))
            :topic (c7 / slam-02
                  :ARG1 c12)))

Again, it has not properly recognized any named entity and because of this (?) made also many other error (like Grand Slam --> "grand manufacture")

A last example, from sports.


# ::id sada
# ::snt Zverev joked that he had been persuaded to play at the tournament by a threat from Djokovic that he would never let him win against him otherwise.
# ::tokens ["Zverev", "joked", "that", "he", "had", "been", "persuaded", "to", "play", "at", "the", "tournament", "by", "a", "threat", "from", "Djokovic", "that", "he", "would", "never", "let", "him", "win", "against", "him", "otherwise", "."]
# ::lemmas ["Zverev", "joke", "that", "he", "have", "be", "persuade", "to", "play", "at", "the", "tournament", "by", "a", "threat", "from", "Djokovic", "that", "he", "would", "never", "let", "he", "win", "against", "he", "otherwise", "."]
# ::pos_tags ["NNP", "VBD", "IN", "PRP", "VBD", "VBN", "VBN", "TO", "VB", "IN", "DT", "NN", "IN", "DT", "NN", "IN", "NNP", "IN", "PRP", "MD", "RB", "VB", "PRP", "VB", "IN", "PRP", "RB", "."]
# ::ner_tags ["PERSON", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "PERSON", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O"]
# ::abstract_map {}
(c0 / joke-01
      :ARG0 (c1 / person 
            :ARG0-of (c3 / have-org-role-91
                  :ARG2 (c8 / zverev))
            :ARG0-of (c4 / play-01
                  :ARG1 (c9 / date-entity))
            :ARG0-of (c5 / let-01
                  :ARG1 (c6 / win-01
                        :ARG2 c1)
                  :time (c7 / threaten-01
                        :ARG0 (c12 / company)
                        :ARG2 c1)
                  :time (c10 / otherwise
                        :op1 c6)
                  :time (c11 / ever)))
      :ARG2 (c2 / persuade-01
            :ARG0 c7
            :ARG1 c1))

Again, all named entities were not recognized and it has hallucinated new concepts (e.g., "(c12 / company)"). The famous tennis player Djiokovic does not even occur in the parse.

These sentences are just randomly sampled, all my outputs look more or less like this. Do you have any idea where the problem could be? The problem doesn't seem to be in the post-processing, the NE errors are already contained (mostly, as far as I can assess this) in the parser output file ckpt.pt_test_out/ckpt.pt_test_out.pred.

Could it be because the # ::abstract_map {} is always empty?

Post-processing fails if sentence is quoted

Problem
The parser will output unbalanced quotes if the the input sentence is quoted. This leads post-processing failure with error penman.DecodeError: Expected ":" or "/" at position XXX

Example .pred file

# ::id 8553
# ::snt "@united got it right with the safety demonstration! Corporate but funny, reserved but NOT CORNY  as a… http://t.co/lwOtKIEKGU"
# ::tokens ["''", "@united", "got", "it", "right", "with", "the", "safety", "demonstration", "!", "Corporate", "but", "funny", ",", "reserved", "but", "NOT", "CORNY", "as", "a", "...", "http://t.co/lwOtKIEKGU\""]
# ::lemmas ["''", "@united", "get", "it", "right", "with", "the", "safety", "demonstration", "!", "corporate", "but", "funny", ",", "reserved", "but", "NOT", "CORNY", "as", "a", "...", "http://t.co/lwotkiekgu\""]
# ::pos_tags ["''", "VBN", "VBD", "PRP", "RB", "IN", "DT", "NN", "NN", ".", "JJ", "CC", "JJ", ",", "JJ", "CC", "NNP", "NNP", "IN", "DT", ":", "NN"]
# ::ner_tags ["O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "URL"]
# ::abstract_map {}
(c0 / multi-sentence
    :snt1 (c1 / contrast
              :ARG1 (c3 / right-06
                        :ARG1 (c7 / it)
                        :ARG2 (c8 / demonstrate
                                  :ARG1 (c10 / safe)))
              :ARG1 (c4 / funny
                        :domain (c9 / string-entity
                                    :value "http://t.co/lwotkiekgu"")))
    :snt2 (c2 / contrast
              :ARG1 (c5 / corporate)
              :ARG2 (c6 / corny
                        :domain c9)))

Note the two quotes at the end of :value "http://t.co/lwotkiekgu"".

./postprocess_2.0.sh sample.txt.pred
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/perry/anaconda3/envs/stog/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/perry/anaconda3/envs/stog/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/home/perry/PycharmProjects/phd/AMR-gs-master/stog/data/dataset_readers/amr_parsing/postprocess/postprocess.py", line 16, in postprocess2
    for amr in nr.restore_file(file_path):
  File "/home/perry/PycharmProjects/phd/AMR-gs-master/stog/data/dataset_readers/amr_parsing/postprocess/node_restore.py", line 19, in restore_file
    for amr in AMRIO.read(file_path):
  File "/home/perry/PycharmProjects/phd/AMR-gs-master/stog/data/dataset_readers/amr_parsing/io.py", line 48, in read
    amr.graph = AMRGraph.decode(' '.join(graph_lines))
  File "/home/perry/PycharmProjects/phd/AMR-gs-master/stog/data/dataset_readers/amr_parsing/amr.py", line 640, in decode
    _graph = amr_codec.decode(raw_graph_string)
  File "/home/perry/anaconda3/envs/stog/lib/python3.6/site-packages/penman.py", line 172, in decode
    span, data = self._decode_penman_node(s)
  File "/home/perry/anaconda3/envs/stog/lib/python3.6/site-packages/penman.py", line 405, in _decode_penman_node
    span, data = self._decode_penman_node(s, pos=pos)
  File "/home/perry/anaconda3/envs/stog/lib/python3.6/site-packages/penman.py", line 405, in _decode_penman_node
    span, data = self._decode_penman_node(s, pos=pos)
  File "/home/perry/anaconda3/envs/stog/lib/python3.6/site-packages/penman.py", line 405, in _decode_penman_node
    span, data = self._decode_penman_node(s, pos=pos)
  File "/home/perry/anaconda3/envs/stog/lib/python3.6/site-packages/penman.py", line 427, in _decode_penman_node
    raise DecodeError('Expected ":" or "/"', string=s, pos=pos)
penman.DecodeError: Expected ":" or "/" at position 375

Workaround
Eliminate quotes before inputting sentences.

Final output files of the models

Hi Deng,

first of all, great work. I wondered if you could share your "official" output files that scores 78.7 and 80.2 on the test set. I'd like to do some analysis regarding other AMR metrics (SemBleu, s2match), and perhaps performance over sentence length.

If possible, I'd also like the test output of your model in the "core semantic first" paper, that scored 73.2

Thanks a lot! Rik

Trouble parsing with a pre-trained model

Hello Deng, first of all thanks for great work!

I'm trying to use your parser to process a simple input file with the pre-trained amr2.0.bert.grmodel. I followed the steps described in the AMR Parsing with Pretrained Models section up to step 3, where I get the following error:

$ python -u -m parser.work --test_data test/input.txt.raw.features.preproc --load_path test/amr2.0.bert.gr/ckpt.pt --output_suffix _test_out
To use data.metrics please install scikit-learn. See https://scikit-learn.org/stable/index.html
Traceback (most recent call last):
  File "/home/waszczuk/.conda/envs/amrgs/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/waszczuk/.conda/envs/amrgs/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/waszczuk/work/github/other/AMR-gs/parser/work.py", line 101, in <module>
    vocabs['tok'] = Vocab(model_args.tok_vocab, 5, [CLS])
  File "/home/waszczuk/work/github/other/AMR-gs/parser/data.py", line 17, in __init__
    for line in open(filename).readlines():
FileNotFoundError: [Errno 2] No such file or directory: '../data/AMR/amr_2.0_reca/tok_vocab'

Am I doing something wrong? Maybe the argument I provide for the load_path parameter is not correct?

About the transferability to Chinese of your work

Hi Deng,

First congratulations on your ACL work.

I'm quite new at AMR and just wanna use AMR as a tool to extract structured information for further use.

I'm wondering if it is possible to transfer your work to Chinese and train an AMR parsing model for Chinese. And if possible, could you give me some help or suggestion about how to achieve that goal?

Thanks a lot.

Kind regards,
Haochun

RuntimeError: CUDA error when run work.sh

Hi, I'm trying to use pretrained model to parse txt to AMR graphs, I tried to run stanford core nlp server and the data preprocessing is successeful( test.txt.raw.features.preproc was successfully generated)
However when I tried to run work.sh this strange bug happened and I had no clue about why this would happen. I tried to run it on CPU and CUDA error still showed up, could you have any advice on this?
my work.sh looks like this:

python3 -u -m parser.work --test_data test.txt.raw.features.preproc --test_batch_size 6666 --load_path ckpt/ckpt.pt --beam_size 8 --alpha 0.6 --max_time_step 100 --output_suffix _test_out

Error log:

sh work.sh
To use data.metrics please install scikit-learn. See https://scikit-learn.org/stable/index.html
read from test.txt.raw.features.preproc, 2 amrs
Get 2 AMRs from test.txt.raw.features.preproc
ckpt/ckpt.pt
Traceback (most recent call last):
File "/media/ntu/volume1/home/s122md301_07/anaconda3/envs/AMR-gs/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/media/ntu/volume1/home/s122md301_07/anaconda3/envs/AMR-gs/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/media/ntu/volume1/home/s122md301_07/AMR-gs/parser/work.py", line 145, in
parse_data(model, pp, another_test_data, args.test_data, test_model+args.output_suffix, args.beam_size, args.alpha, args.max_time_step)
File "/media/ntu/volume1/home/s122md301_07/AMR-gs/parser/work.py", line 68, in parse_data
res = parse_batch(model, batch, beam_size, alpha, max_time_step)
File "/media/ntu/volume1/home/s122md301_07/AMR-gs/parser/work.py", line 42, in parse_batch
beams = model.work(batch, beam_size, max_time_step)
File "/media/ntu/volume1/home/s122md301_07/AMR-gs/parser/parser.py", line 83, in work
word_repr, word_mask, probe = self.encode_step_with_bert(data['tok'], data['lem'], data['pos'], data['ner'], data['word_char'], data['bert_token'], data['token_subword_index'])
File "/media/ntu/volume1/home/s122md301_07/AMR-gs/parser/parser.py", line 65, in encode_step_with_bert
bert_embed, _ = self.bert_encoder(bert_token, token_subword_index=token_subword_index)
File "/media/ntu/volume1/home/s122md301_07/anaconda3/envs/AMR-gs/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/media/ntu/volume1/home/s122md301_07/AMR-gs/parser/bert_utils.py", line 55, in forward
input_ids, attention_mask, token_type_ids)
File "/media/ntu/volume1/home/s122md301_07/anaconda3/envs/AMR-gs/lib/python3.6/site-packages/transformers/modeling_bert.py", line 627, in forward
head_mask=head_mask)
File "/media/ntu/volume1/home/s122md301_07/anaconda3/envs/AMR-gs/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/media/ntu/volume1/home/s122md301_07/anaconda3/envs/AMR-gs/lib/python3.6/site-packages/transformers/modeling_bert.py", line 348, in forward
layer_outputs = layer_module(hidden_states, attention_mask, head_mask[i])
File "/media/ntu/volume1/home/s122md301_07/anaconda3/envs/AMR-gs/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/media/ntu/volume1/home/s122md301_07/anaconda3/envs/AMR-gs/lib/python3.6/site-packages/transformers/modeling_bert.py", line 326, in forward
attention_outputs = self.attention(hidden_states, attention_mask, head_mask)
File "/media/ntu/volume1/home/s122md301_07/anaconda3/envs/AMR-gs/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/media/ntu/volume1/home/s122md301_07/anaconda3/envs/AMR-gs/lib/python3.6/site-packages/transformers/modeling_bert.py", line 283, in forward
self_outputs = self.self(input_tensor, attention_mask, head_mask)
File "/media/ntu/volume1/home/s122md301_07/anaconda3/envs/AMR-gs/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/media/ntu/volume1/home/s122md301_07/anaconda3/envs/AMR-gs/lib/python3.6/site-packages/transformers/modeling_bert.py", line 202, in forward
mixed_query_layer = self.query(hidden_states)
File "/media/ntu/volume1/home/s122md301_07/anaconda3/envs/AMR-gs/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/media/ntu/volume1/home/s122md301_07/anaconda3/envs/AMR-gs/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 87, in forward
return F.linear(input, self.weight, self.bias)
File "/media/ntu/volume1/home/s122md301_07/anaconda3/envs/AMR-gs/lib/python3.6/site-packages/torch/nn/functional.py", line 1371, in linear
output = input.matmul(weight.t())
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

sh preprocess_raw.sh input.txt fails

Hi!
I would like to try the AMR parser on the pretrained model.
I successfully open StanfordNLPServer, then I try to run the preprocess_raw.sh input.txt-command. It fails when executing the command:

python -u -m stog.data.dataset_readers.amr_parsing.preprocess.feature_annotator \
    ${raw_file}.raw \
    --compound_file ${compound_file}

I'm running the code in a docker container using python 3.6.13 with all dependencies from requirements.txt. The input.txt-file contains two simple sentences. The traceback is pasted below.

Am I missing something?

root@d92e2317f450:/app# sh preprocess_raw.sh input.txt
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/runpy.py", line 183, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/usr/local/lib/python3.6/runpy.py", line 109, in _get_module_details
    __import__(pkg_name)
  File "/app/stog/data/__init__.py", line 1, in <module>
    from stog.data.dataset_readers.dataset_reader import DatasetReader
  File "/app/stog/data/dataset_readers/__init__.py", line 10, in <module>
    from stog.data.dataset_readers.dataset_reader import DatasetReader
  File "/app/stog/data/dataset_readers/dataset_reader.py", line 4, in <module>
    from stog.data.instance import Instance
  File "/app/stog/data/instance.py", line 3, in <module>
    from stog.data.fields.field import DataArray, Field
  File "/app/stog/data/fields/__init__.py", line 7, in <module>
    from stog.data.fields.array_field import ArrayField
  File "/app/stog/data/fields/array_field.py", line 10, in <module>
    class ArrayField(Field[numpy.ndarray]):
  File "/app/stog/data/fields/array_field.py", line 42, in ArrayField
    @overrides
  File "/usr/local/lib/python3.6/site-packages/overrides/overrides.py", line 88, in overrides
    return _overrides(method, check_signature, check_at_runtime)
  File "/usr/local/lib/python3.6/site-packages/overrides/overrides.py", line 114, in _overrides
    _validate_method(method, super_class, check_signature)
  File "/usr/local/lib/python3.6/site-packages/overrides/overrides.py", line 135, in _validate_method
    ensure_signature_is_compatible(super_method, method, is_static)
  File "/usr/local/lib/python3.6/site-packages/overrides/signature.py", line 93, in ensure_signature_is_compatible
    ensure_return_type_compatibility(super_type_hints, sub_type_hints, method_name)
  File "/usr/local/lib/python3.6/site-packages/overrides/signature.py", line 288, in ensure_return_type_compatibility
    f"{method_name}: return type `{sub_return}` is not a `{super_return}`."
TypeError: ArrayField.empty_field: return type `None` is not a `stog.data.fields.field.Field`.

TypeError: string indices must be integers

When I run the command sh annotate_features.sh data/AMR/amr_2.0, it will report an error: TypeError: string indices must be integers
details as follows:
File "../AMR-gs/stog/data/dataset_readers/amr_parsing/preprocess/feature_annotator.py", line 205, in
annotation = annotator(amr.sentence)
File "../AMR-gs/stog/data/dataset_readers/amr_parsing/preprocess/feature_annotator.py", line 75, in call
annotation = self.annotate(text)
File "../AMR-gs/stog/data/dataset_readers/amr_parsing/preprocess/feature_annotator.py", line 63, in annotate
tokens = self.nlp.annotate(text.strip(), self.nlp_properties)['sentences'][0]['tokens']
TypeError: string indices must be integers

What can cause this? Looking forward to your reply, thank you very much!

Problem in data/AMR/amr_2.0_utils/text_anonymization_rules.json

I got an error when runing sh preprocess_raw.sh sentence.txt:

Recategorizing subgraphs...2021年 08月 06日 星期五 23:57:10 CST
[2021-08-06 23:57:11,707 INFO] Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .
Traceback (most recent call last):
File "/data/leon/miniconda3/envs/amrgs/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/data/leon/miniconda3/envs/amrgs/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/data/leon/LP4EE/RAMS_1.0/AMR-gs-master/stog/data/dataset_readers/amr_parsing/preprocess/text_anonymizor.py", line 250, in
amr.abstract_map = text_anonymizor(amr)
File "/data/leon/LP4EE/RAMS_1.0/AMR-gs-master/stog/data/dataset_readers/amr_parsing/preprocess/text_anonymizor.py", line 61, in call
amr, text_map, max_length, anonym_type, pos_tag
File "/data/leon/LP4EE/RAMS_1.0/AMR-gs-master/stog/data/dataset_readers/amr_parsing/preprocess/text_anonymizor.py", line 83, in _abstract
collected_entities,
File "/data/leon/LP4EE/RAMS_1.0/AMR-gs-master/stog/data/dataset_readers/amr_parsing/preprocess/text_anonymizor.py", line 181, in _replace_span
if length == 1 and self._leave_as_is(start, amr, text_map, anonym_type):
File "/data/leon/LP4EE/RAMS_1.0/AMR-gs-master/stog/data/dataset_readers/amr_parsing/preprocess/text_anonymizor.py", line 143, in _leave_as_is
if is_anonym_type(index, amr, text_map, ['DATE_ATTRS']) and next_token_is(index, 1, amr, r"^''$"):
File "/data/leon/LP4EE/RAMS_1.0/AMR-gs-master/stog/data/dataset_readers/amr_parsing/preprocess/text_anonymizor.py", line 28, in is_anonym_type
return lemma in text_map and text_map[lemma]['ner'] in types

and the text_map above was got by
text_anonymizor = TextAnonymizor.from_json( os.path.join(args.util_dir, "text_anonymization_rules.json"))
so i analysis the json file and i find there are two wrong key in this file:
"span": "Chinese" in the anonym_type of named-entity
and
"span": "2002 08 28" in the anonym_type of date-entity
so i delete them and the error above don't show again.

I guess the key of "span" was mis-written in the text_anonymization_rules.json, is that right?

Graph recategorization fails if sentence to parse contains "span"

To reproduce, run ./preprocess.sh on any sentence e.g. "Short span of time".

Error:

  File "/stog/data/dataset_readers/amr_parsing/preprocess/text_anonymizor.py", line 26, in is_anonym_type
    return lemma in text_map and text_map[lemma]['ner'] in types
TypeError: string indices must be integers

Reason: text_map[lemma] turns out to be a string ('Chinese' or '2002 08 28').

Fix:

  • Remove "span": "Chinese" from data/AMR/amr_2.0_utils/text_anonymization_rules.json
  • Remove "span": "2002 08 28" from data/AMR/amr_2.0_utils/text_anonymization_rules.json

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.