jcyk / amr-gs Goto Github PK
View Code? Open in Web Editor NEWAMR Parsing via Graph-Sequence Iterative Inference
License: MIT License
AMR Parsing via Graph-Sequence Iterative Inference
License: MIT License
I'm trying to parse raw sentences with a pretrained model. The file has one sentence per line as expected and I am running the Stanford CoreNLP server.
When I try to run
sh preprocess_raw.sh my_filename.txt
it successfully creates my_filename.txt.raw, as well as an empty file called my_filename.txt.raw.features.input_clean, but then throws an error:
FileNotFoundError: [Errno 2] No such file or directory: 'data/AMR/amr_2.0/my_filename.txt.raw.features'
I'm not sure where the .features file is supposed to come from - am I missing a preprocessing step?
I thought maybe I needed to run annotate_features.sh, but that seems to expect a full directory structure with train/dev/test split, and I just want to run a pretrained model on a file of test sentences.
how to get the alignments between tokens and node?
Just FYI in case anyone's interested...
I trained the "no recategorization" branch of the model with LDC2020T02 (amr 3.0) and got a 76.7 smatch score. I didn't spend much time trying to optimize hyper-parameters and I'm using the amr 2.0 utils directory so possibly there's additional optimizations to be had, or maybe the amr 3 corpus is just more complex with the new "multi-sentence" annotations, etc..
I'm also using this model in amrlib with all of the sheng-z/stog code removed. In my version of the model code there's no pre/post processing at all. In addition, I've also switched to spaCy for annotations. I'm getting about the same 77 smatch under these conditions.
amrlib is intended as a user library for parsing and generation. I've simplified some of the parsing routines for the end-user and updated code to the latest version of penman, pytorch, sped up smatch scoring, etc.. Feel free to pull portions of revised code if you have any interest. I'd be happy to see a little more optimization of the model in that setting, though I'm not planning on focusing it myself.
The library also includes a Huggingface T5 model re-trained for graph-to-sentence generation that gets a 43 BLEU on LDC2020T02. It's a lot easier coding wise than jcyk/gtos and amazingly effective.
Hi Deng,
First congratulations on your ACL work.
I read your paper and code and then i notice that your paper mentioned about EarlyStopping but i don't find it in the code. Should I stop it by hand? Another question is about the Epoch Setting, the
epoch setting in your shell code is 100000, it's so big and the train time will very very long. Is there some extra detail i didn't noticed?
Thanks a lot.
Kind regards,
Feng Yunlong
Hello Deng, thank you for your great work, I met some trouble when trying to use the pretrain model to parse my own text file.
I first run sh preprocess_raw.sh sample.txt
, where there is only one sentence in sample.txt.
second, I change the scripts work.sh as follows:
python3 -u -m parser.work --test_data sample.txt.raw.features.preproc \ --test_batch_size 1 \ --load_path /home/lchen/AMR-gs/amr2.0.bert.gr/ckpt.pt \ --beam_size 8\ --alpha 0.6\ --max_time_step 100\ --output_suffix _test_out
and I get following messages
Traceback (most recent call last): File "/home/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/home/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/lchen/AMR-gs/parser/work.py", line 115, in <module> bert_tokenizer = BertEncoderTokenizer.from_pretrained(model_args.bert_path, do_lower_case=False) File "/home/lchen/AMR-gs/amr/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 282, in from_pretrained return cls._from_pretrained(*inputs, **kwargs) File "/home/lchen/AMR-gs/amr/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 346, in _from_pretrained list(cls.vocab_files_names.values()))) OSError: Model name '../bert-base-cased' was not found in tokenizers model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased). We assumed '../bert-base-cased' was a path or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.
Is there anything wrong about my procedure ?
Thanks a lot
Hi Deng,
thanks for the awesome work. I am interested in parsing arbitrary sentence. I took your advice with appending the sentences with dummy-AMRs to the test. The preprocessing worked fine so far, however, when I want to apply the pre-trained model to predict using work.sh (with adapted file path for pre-trained model), I get the following error:
File "/xxxxx/AMR-gs/parser/work.py", line 94, in
model_args = torch.load(fname)['args']
....
File "/zzzzzr/lib/python3.6/site-packages/torch/serialization.py", line 563, in _load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '\x00'
The path in work.sh looks as follows:
--load_path amr2.0.bert.gr \
Previously I had downloaded your pre-trained model in the main directory and used tar -xvzf to extract. Did I make any error in the extraction process? My amr2.0.bert.gr contains a file "ckpt.pt" and a directory "vocabs".
Just FYI in case someone else runs into this issue...
When I try to run python3 -u -m parser.extract xxx
I get an error that is related to the fact that parser
is a standard module in python3.
I went into this directory and changed the absolute imports to relative imports (ie.. change from parser.data import xx
to from .data import xx
) and renamed the top level directory from parser
to model_gsii
.
python3 -u -m model_gsii.extract xxx
now works correctly.
Hi Deng,
I'm trying to use the pretrained model but it seems to be missing the files of "data/AMR/amr_2.0_utils/". Could you please update the repository with these files?
Thanks
Kind regards,
Carlos
Problem
The parser will output a non-breaking space character if the the input sentence contains \d+ \d+
. This leads post-processing failure with error penman.DecodeError: Expected ":" or "/" at position XXX
Example .pred
file
# ::id 9900
# ::snt @united iCloud it's not there yet -- PLEASE HELP 917 703 1472
# ::tokens ["@united", "iCloud", "it", "'s", "not", "there", "yet", "--", "PLEASE", "HELP", "917\u00a0703\u00a01472"]
# ::lemmas ["@united", "icloud", "it", "be", "not", "there", "yet", "--", "please", "help", "917\u00a0703\u00a01472"]
# ::pos_tags ["VBN", "NN", "PRP", "VBZ", "RB", "RB", "RB", ":", "VB", "NN", "CD"]
# ::ner_tags ["O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "NUMBER"]
# ::abstract_map {}
(c0 / multi-sentence
:snt1 (c1 / icloud
:mod (c3 / be-located-at
:ARG1 (c7 / it)
:ARG2 (c8 / there)
:time (c9 / yet)))
:snt2 (c2 / help-01
:ARG1 (c4 / you)
:mode imperative
:ARG1 (c6 / book
:name (c10 / 917 703 1472))))
Note that in the last line, 917 703 1472
contains non-breaking spaces.
./postprocess_2.0.sh sample.txt.pred
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/perry/anaconda3/envs/stog/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/perry/anaconda3/envs/stog/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/home/perry/PycharmProjects/phd/AMR-gs-master/stog/data/dataset_readers/amr_parsing/postprocess/postprocess.py", line 16, in postprocess2
for amr in nr.restore_file(file_path):
File "/home/perry/PycharmProjects/phd/AMR-gs-master/stog/data/dataset_readers/amr_parsing/postprocess/node_restore.py", line 19, in restore_file
for amr in AMRIO.read(file_path):
File "/home/perry/PycharmProjects/phd/AMR-gs-master/stog/data/dataset_readers/amr_parsing/io.py", line 48, in read
amr.graph = AMRGraph.decode(' '.join(graph_lines))
File "/home/perry/PycharmProjects/phd/AMR-gs-master/stog/data/dataset_readers/amr_parsing/amr.py", line 640, in decode
_graph = amr_codec.decode(raw_graph_string)
File "/home/perry/anaconda3/envs/stog/lib/python3.6/site-packages/penman.py", line 172, in decode
span, data = self._decode_penman_node(s)
File "/home/perry/anaconda3/envs/stog/lib/python3.6/site-packages/penman.py", line 405, in _decode_penman_node
span, data = self._decode_penman_node(s, pos=pos)
File "/home/perry/anaconda3/envs/stog/lib/python3.6/site-packages/penman.py", line 405, in _decode_penman_node
span, data = self._decode_penman_node(s, pos=pos)
File "/home/perry/anaconda3/envs/stog/lib/python3.6/site-packages/penman.py", line 405, in _decode_penman_node
span, data = self._decode_penman_node(s, pos=pos)
File "/home/perry/anaconda3/envs/stog/lib/python3.6/site-packages/penman.py", line 427, in _decode_penman_node
raise DecodeError('Expected ":" or "/"', string=s, pos=pos)
penman.DecodeError: Expected ":" or "/" at position 364
Workaround
Check for non-breaking spaces and replace them with -
or _
in the output.
Hi Deng,
I am now able to parse arbitrary sentences with your pre-trained model, thanks to your valuable tips!
My pipeline looks as follows:
That writes a file ckpt.pt_test_out.pred.post, of which I assume that it is the final result.
However, the quality of arbitrary sentence parses is not so good. It does seem to struggle a bit with unseen named entities and this leads to errors. Named entities that are in the training data are perfectly recognized, but not so much new ones. Here is an example of a short sentence:
# ::id dasd
# ::snt In November 1581, Feodor's elder brother Ivan Ivanovich was killed by their father in a fit of rage.
# ::tokens ["In", "November", "1581", ",", "Feodor", "'s", "elder", "brother", "Ivan", "Ivanovich", "was", "killed", "by", "their", "father", "in", "a", "fit", "of", "rage", "."]
# ::lemmas ["in", "November", "1581", ",", "Feodor", "'s", "elder", "brother", "Ivan", "Ivanovich", "be", "kill", "by", "they", "father", "in", "a", "fit", "of", "rage", "."]
# ::pos_tags ["IN", "NNP", "CD", ",", "NNP", "POS", "JJR", "NN", "NNP", "NNP", "VBD", "VBN", "IN", "PRP$", "NN", "IN", "DT", "NN", "IN", "NN", "."]
# ::ner_tags ["O", "DATE", "DATE", "O", "PERSON", "O", "O", "O", "PERSON", "PERSON", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O"]
# ::abstract_map {}
(c0 / kill-01
:ARG0 (c2 / person
:ARG0-of (c7 / have-rel-role-91
:ARG1 (c11 / person)
:ARG2 (c13 / brother)))
:ARG1 (c1 / person
:ARG0-of (c6 / have-rel-role-91
:ARG0 c2
:ARG2 c11
:ARG2 (c12 / father))
:mod (c5 / elder
:domain c2))
:time (c3 / name
:op1 "november"
:op2 "ivan"
:op3 "1581")
:time (c4 / rage-02
:ARG1 c1))
The parser has struggled with new named entities, apparently. It also contains, due to this, many other errors (e.g. two ARG2 of c6). It has also not even detected tsar "Feodor".
Here is an example of a longer sentence, this time from US press.
# ::id sda sd
# ::snt The Grand Slam at Flushing Meadows is scheduled to begin on August 31, but with New York one of the cities hardest hit by coronavirus there are doubts over whether the tournament can take place.
# ::tokens ["The", "Grand", "Slam", "at", "Flushing", "Meadows", "is", "scheduled", "to", "begin", "on", "August", "31", ",", "but", "with", "New", "York", "one", "of", "the", "cities", "hardest", "hit", "by", "coronavirus", "there", "are", "doubts", "over", "whether", "the", "tournament", "can", "take", "place", "."]
# ::lemmas ["the", "Grand", "Slam", "at", "Flushing", "Meadows", "be", "schedule", "to", "begin", "on", "August", "31", ",", "but", "with", "New", "York", "one", "of", "the", "city", "hardest", "hit", "by", "coronavirus", "there", "be", "doubt", "over", "whether", "the", "tournament", "can", "take", "place", "."]
# ::pos_tags ["DT", "NNP", "NNP", "IN", "NNP", "NNP", "VBZ", "VBN", "TO", "VB", "IN", "NNP", "CD", ",", "CC", "IN", "NNP", "NNP", "CD", "IN", "DT", "NNS", "RBS", "VBN", "IN", "NN", "EX", "VBP", "NNS", "IN", "IN", "DT", "NN", "MD", "VB", "NN", "."]
# ::ner_tags ["O", "MISC", "MISC", "O", "LOCATION", "LOCATION", "O", "O", "O", "O", "O", "DATE", "DATE", "O", "O", "O", "STATE_OR_PROVINCE", "STATE_OR_PROVINCE", "NUMBER", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O"]
# ::abstract_map {}
(c0 / have-concession-91
:ARG1 (c1 / schedule-01
:ARG1 (c3 / begin-01)
:ARG3 (c4 / date-entity
:mod 31
:mod (c9 / august)))
:ARG2 (c2 / doubt-01
:ARG1 (c5 / possible-01
:ARG1 (c10 / take-01
:ARG1 (c12 / grand)
:ARG1 (c13 / manufacture-01)))
:ARG1-of (c6 / cause-01
:ARG0 (c11 / include-91
:ARG2 (c14 / city
:ARG1-of (c15 / hit-01
:ARG0-of (c17 / near-02
:degree (c18 / most))
:ARG2 (c16 / coronavirus)))))
:topic (c7 / slam-02
:ARG1 c12)))
Again, it has not properly recognized any named entity and because of this (?) made also many other error (like Grand Slam --> "grand manufacture")
A last example, from sports.
# ::id sada
# ::snt Zverev joked that he had been persuaded to play at the tournament by a threat from Djokovic that he would never let him win against him otherwise.
# ::tokens ["Zverev", "joked", "that", "he", "had", "been", "persuaded", "to", "play", "at", "the", "tournament", "by", "a", "threat", "from", "Djokovic", "that", "he", "would", "never", "let", "him", "win", "against", "him", "otherwise", "."]
# ::lemmas ["Zverev", "joke", "that", "he", "have", "be", "persuade", "to", "play", "at", "the", "tournament", "by", "a", "threat", "from", "Djokovic", "that", "he", "would", "never", "let", "he", "win", "against", "he", "otherwise", "."]
# ::pos_tags ["NNP", "VBD", "IN", "PRP", "VBD", "VBN", "VBN", "TO", "VB", "IN", "DT", "NN", "IN", "DT", "NN", "IN", "NNP", "IN", "PRP", "MD", "RB", "VB", "PRP", "VB", "IN", "PRP", "RB", "."]
# ::ner_tags ["PERSON", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "PERSON", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O"]
# ::abstract_map {}
(c0 / joke-01
:ARG0 (c1 / person
:ARG0-of (c3 / have-org-role-91
:ARG2 (c8 / zverev))
:ARG0-of (c4 / play-01
:ARG1 (c9 / date-entity))
:ARG0-of (c5 / let-01
:ARG1 (c6 / win-01
:ARG2 c1)
:time (c7 / threaten-01
:ARG0 (c12 / company)
:ARG2 c1)
:time (c10 / otherwise
:op1 c6)
:time (c11 / ever)))
:ARG2 (c2 / persuade-01
:ARG0 c7
:ARG1 c1))
Again, all named entities were not recognized and it has hallucinated new concepts (e.g., "(c12 / company)"). The famous tennis player Djiokovic does not even occur in the parse.
These sentences are just randomly sampled, all my outputs look more or less like this. Do you have any idea where the problem could be? The problem doesn't seem to be in the post-processing, the NE errors are already contained (mostly, as far as I can assess this) in the parser output file ckpt.pt_test_out/ckpt.pt_test_out.pred.
Could it be because the # ::abstract_map {} is always empty?
Problem
The parser will output unbalanced quotes if the the input sentence is quoted. This leads post-processing failure with error penman.DecodeError: Expected ":" or "/" at position XXX
Example .pred
file
# ::id 8553
# ::snt "@united got it right with the safety demonstration! Corporate but funny, reserved but NOT CORNY as a… http://t.co/lwOtKIEKGU"
# ::tokens ["''", "@united", "got", "it", "right", "with", "the", "safety", "demonstration", "!", "Corporate", "but", "funny", ",", "reserved", "but", "NOT", "CORNY", "as", "a", "...", "http://t.co/lwOtKIEKGU\""]
# ::lemmas ["''", "@united", "get", "it", "right", "with", "the", "safety", "demonstration", "!", "corporate", "but", "funny", ",", "reserved", "but", "NOT", "CORNY", "as", "a", "...", "http://t.co/lwotkiekgu\""]
# ::pos_tags ["''", "VBN", "VBD", "PRP", "RB", "IN", "DT", "NN", "NN", ".", "JJ", "CC", "JJ", ",", "JJ", "CC", "NNP", "NNP", "IN", "DT", ":", "NN"]
# ::ner_tags ["O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "URL"]
# ::abstract_map {}
(c0 / multi-sentence
:snt1 (c1 / contrast
:ARG1 (c3 / right-06
:ARG1 (c7 / it)
:ARG2 (c8 / demonstrate
:ARG1 (c10 / safe)))
:ARG1 (c4 / funny
:domain (c9 / string-entity
:value "http://t.co/lwotkiekgu"")))
:snt2 (c2 / contrast
:ARG1 (c5 / corporate)
:ARG2 (c6 / corny
:domain c9)))
Note the two quotes at the end of :value "http://t.co/lwotkiekgu""
.
./postprocess_2.0.sh sample.txt.pred
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/perry/anaconda3/envs/stog/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/perry/anaconda3/envs/stog/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/home/perry/PycharmProjects/phd/AMR-gs-master/stog/data/dataset_readers/amr_parsing/postprocess/postprocess.py", line 16, in postprocess2
for amr in nr.restore_file(file_path):
File "/home/perry/PycharmProjects/phd/AMR-gs-master/stog/data/dataset_readers/amr_parsing/postprocess/node_restore.py", line 19, in restore_file
for amr in AMRIO.read(file_path):
File "/home/perry/PycharmProjects/phd/AMR-gs-master/stog/data/dataset_readers/amr_parsing/io.py", line 48, in read
amr.graph = AMRGraph.decode(' '.join(graph_lines))
File "/home/perry/PycharmProjects/phd/AMR-gs-master/stog/data/dataset_readers/amr_parsing/amr.py", line 640, in decode
_graph = amr_codec.decode(raw_graph_string)
File "/home/perry/anaconda3/envs/stog/lib/python3.6/site-packages/penman.py", line 172, in decode
span, data = self._decode_penman_node(s)
File "/home/perry/anaconda3/envs/stog/lib/python3.6/site-packages/penman.py", line 405, in _decode_penman_node
span, data = self._decode_penman_node(s, pos=pos)
File "/home/perry/anaconda3/envs/stog/lib/python3.6/site-packages/penman.py", line 405, in _decode_penman_node
span, data = self._decode_penman_node(s, pos=pos)
File "/home/perry/anaconda3/envs/stog/lib/python3.6/site-packages/penman.py", line 405, in _decode_penman_node
span, data = self._decode_penman_node(s, pos=pos)
File "/home/perry/anaconda3/envs/stog/lib/python3.6/site-packages/penman.py", line 427, in _decode_penman_node
raise DecodeError('Expected ":" or "/"', string=s, pos=pos)
penman.DecodeError: Expected ":" or "/" at position 375
Workaround
Eliminate quotes before inputting sentences.
Hi Deng,
first of all, great work. I wondered if you could share your "official" output files that scores 78.7 and 80.2 on the test set. I'd like to do some analysis regarding other AMR metrics (SemBleu, s2match), and perhaps performance over sentence length.
If possible, I'd also like the test output of your model in the "core semantic first" paper, that scored 73.2
Thanks a lot! Rik
Hello Deng, first of all thanks for great work!
I'm trying to use your parser to process a simple input file with the pre-trained amr2.0.bert.gr
model. I followed the steps described in the AMR Parsing with Pretrained Models section up to step 3, where I get the following error:
$ python -u -m parser.work --test_data test/input.txt.raw.features.preproc --load_path test/amr2.0.bert.gr/ckpt.pt --output_suffix _test_out
To use data.metrics please install scikit-learn. See https://scikit-learn.org/stable/index.html
Traceback (most recent call last):
File "/home/waszczuk/.conda/envs/amrgs/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/waszczuk/.conda/envs/amrgs/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/waszczuk/work/github/other/AMR-gs/parser/work.py", line 101, in <module>
vocabs['tok'] = Vocab(model_args.tok_vocab, 5, [CLS])
File "/home/waszczuk/work/github/other/AMR-gs/parser/data.py", line 17, in __init__
for line in open(filename).readlines():
FileNotFoundError: [Errno 2] No such file or directory: '../data/AMR/amr_2.0_reca/tok_vocab'
Am I doing something wrong? Maybe the argument I provide for the load_path
parameter is not correct?
Hi Deng,
First congratulations on your ACL work.
I'm quite new at AMR and just wanna use AMR as a tool to extract structured information for further use.
I'm wondering if it is possible to transfer your work to Chinese and train an AMR parsing model for Chinese. And if possible, could you give me some help or suggestion about how to achieve that goal?
Thanks a lot.
Kind regards,
Haochun
Hi, I'm trying to use pretrained model to parse txt to AMR graphs, I tried to run stanford core nlp server and the data preprocessing is successeful( test.txt.raw.features.preproc was successfully generated)
However when I tried to run work.sh this strange bug happened and I had no clue about why this would happen. I tried to run it on CPU and CUDA error still showed up, could you have any advice on this?
my work.sh looks like this:
python3 -u -m parser.work --test_data test.txt.raw.features.preproc --test_batch_size 6666 --load_path ckpt/ckpt.pt --beam_size 8 --alpha 0.6 --max_time_step 100 --output_suffix _test_out
Error log:
sh work.sh
To use data.metrics please install scikit-learn. See https://scikit-learn.org/stable/index.html
read from test.txt.raw.features.preproc, 2 amrs
Get 2 AMRs from test.txt.raw.features.preproc
ckpt/ckpt.pt
Traceback (most recent call last):
File "/media/ntu/volume1/home/s122md301_07/anaconda3/envs/AMR-gs/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/media/ntu/volume1/home/s122md301_07/anaconda3/envs/AMR-gs/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/media/ntu/volume1/home/s122md301_07/AMR-gs/parser/work.py", line 145, in
parse_data(model, pp, another_test_data, args.test_data, test_model+args.output_suffix, args.beam_size, args.alpha, args.max_time_step)
File "/media/ntu/volume1/home/s122md301_07/AMR-gs/parser/work.py", line 68, in parse_data
res = parse_batch(model, batch, beam_size, alpha, max_time_step)
File "/media/ntu/volume1/home/s122md301_07/AMR-gs/parser/work.py", line 42, in parse_batch
beams = model.work(batch, beam_size, max_time_step)
File "/media/ntu/volume1/home/s122md301_07/AMR-gs/parser/parser.py", line 83, in work
word_repr, word_mask, probe = self.encode_step_with_bert(data['tok'], data['lem'], data['pos'], data['ner'], data['word_char'], data['bert_token'], data['token_subword_index'])
File "/media/ntu/volume1/home/s122md301_07/AMR-gs/parser/parser.py", line 65, in encode_step_with_bert
bert_embed, _ = self.bert_encoder(bert_token, token_subword_index=token_subword_index)
File "/media/ntu/volume1/home/s122md301_07/anaconda3/envs/AMR-gs/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/media/ntu/volume1/home/s122md301_07/AMR-gs/parser/bert_utils.py", line 55, in forward
input_ids, attention_mask, token_type_ids)
File "/media/ntu/volume1/home/s122md301_07/anaconda3/envs/AMR-gs/lib/python3.6/site-packages/transformers/modeling_bert.py", line 627, in forward
head_mask=head_mask)
File "/media/ntu/volume1/home/s122md301_07/anaconda3/envs/AMR-gs/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/media/ntu/volume1/home/s122md301_07/anaconda3/envs/AMR-gs/lib/python3.6/site-packages/transformers/modeling_bert.py", line 348, in forward
layer_outputs = layer_module(hidden_states, attention_mask, head_mask[i])
File "/media/ntu/volume1/home/s122md301_07/anaconda3/envs/AMR-gs/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/media/ntu/volume1/home/s122md301_07/anaconda3/envs/AMR-gs/lib/python3.6/site-packages/transformers/modeling_bert.py", line 326, in forward
attention_outputs = self.attention(hidden_states, attention_mask, head_mask)
File "/media/ntu/volume1/home/s122md301_07/anaconda3/envs/AMR-gs/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/media/ntu/volume1/home/s122md301_07/anaconda3/envs/AMR-gs/lib/python3.6/site-packages/transformers/modeling_bert.py", line 283, in forward
self_outputs = self.self(input_tensor, attention_mask, head_mask)
File "/media/ntu/volume1/home/s122md301_07/anaconda3/envs/AMR-gs/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/media/ntu/volume1/home/s122md301_07/anaconda3/envs/AMR-gs/lib/python3.6/site-packages/transformers/modeling_bert.py", line 202, in forward
mixed_query_layer = self.query(hidden_states)
File "/media/ntu/volume1/home/s122md301_07/anaconda3/envs/AMR-gs/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/media/ntu/volume1/home/s122md301_07/anaconda3/envs/AMR-gs/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 87, in forward
return F.linear(input, self.weight, self.bias)
File "/media/ntu/volume1/home/s122md301_07/anaconda3/envs/AMR-gs/lib/python3.6/site-packages/torch/nn/functional.py", line 1371, in linear
output = input.matmul(weight.t())
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)
Hi!
I would like to try the AMR parser on the pretrained model.
I successfully open StanfordNLPServer, then I try to run the preprocess_raw.sh input.txt
-command. It fails when executing the command:
python -u -m stog.data.dataset_readers.amr_parsing.preprocess.feature_annotator \
${raw_file}.raw \
--compound_file ${compound_file}
I'm running the code in a docker container using python 3.6.13 with all dependencies from requirements.txt. The input.txt-file contains two simple sentences. The traceback is pasted below.
Am I missing something?
root@d92e2317f450:/app# sh preprocess_raw.sh input.txt
Traceback (most recent call last):
File "/usr/local/lib/python3.6/runpy.py", line 183, in _run_module_as_main
mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
File "/usr/local/lib/python3.6/runpy.py", line 109, in _get_module_details
__import__(pkg_name)
File "/app/stog/data/__init__.py", line 1, in <module>
from stog.data.dataset_readers.dataset_reader import DatasetReader
File "/app/stog/data/dataset_readers/__init__.py", line 10, in <module>
from stog.data.dataset_readers.dataset_reader import DatasetReader
File "/app/stog/data/dataset_readers/dataset_reader.py", line 4, in <module>
from stog.data.instance import Instance
File "/app/stog/data/instance.py", line 3, in <module>
from stog.data.fields.field import DataArray, Field
File "/app/stog/data/fields/__init__.py", line 7, in <module>
from stog.data.fields.array_field import ArrayField
File "/app/stog/data/fields/array_field.py", line 10, in <module>
class ArrayField(Field[numpy.ndarray]):
File "/app/stog/data/fields/array_field.py", line 42, in ArrayField
@overrides
File "/usr/local/lib/python3.6/site-packages/overrides/overrides.py", line 88, in overrides
return _overrides(method, check_signature, check_at_runtime)
File "/usr/local/lib/python3.6/site-packages/overrides/overrides.py", line 114, in _overrides
_validate_method(method, super_class, check_signature)
File "/usr/local/lib/python3.6/site-packages/overrides/overrides.py", line 135, in _validate_method
ensure_signature_is_compatible(super_method, method, is_static)
File "/usr/local/lib/python3.6/site-packages/overrides/signature.py", line 93, in ensure_signature_is_compatible
ensure_return_type_compatibility(super_type_hints, sub_type_hints, method_name)
File "/usr/local/lib/python3.6/site-packages/overrides/signature.py", line 288, in ensure_return_type_compatibility
f"{method_name}: return type `{sub_return}` is not a `{super_return}`."
TypeError: ArrayField.empty_field: return type `None` is not a `stog.data.fields.field.Field`.
When I run the command sh annotate_features.sh data/AMR/amr_2.0, it will report an error: TypeError: string indices must be integers
details as follows:
File "../AMR-gs/stog/data/dataset_readers/amr_parsing/preprocess/feature_annotator.py", line 205, in
annotation = annotator(amr.sentence)
File "../AMR-gs/stog/data/dataset_readers/amr_parsing/preprocess/feature_annotator.py", line 75, in call
annotation = self.annotate(text)
File "../AMR-gs/stog/data/dataset_readers/amr_parsing/preprocess/feature_annotator.py", line 63, in annotate
tokens = self.nlp.annotate(text.strip(), self.nlp_properties)['sentences'][0]['tokens']
TypeError: string indices must be integers
What can cause this? Looking forward to your reply, thank you very much!
I got an error when runing sh preprocess_raw.sh sentence.txt:
Recategorizing subgraphs...2021年 08月 06日 星期五 23:57:10 CST
[2021-08-06 23:57:11,707 INFO] Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .
Traceback (most recent call last):
File "/data/leon/miniconda3/envs/amrgs/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/data/leon/miniconda3/envs/amrgs/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/data/leon/LP4EE/RAMS_1.0/AMR-gs-master/stog/data/dataset_readers/amr_parsing/preprocess/text_anonymizor.py", line 250, in
amr.abstract_map = text_anonymizor(amr)
File "/data/leon/LP4EE/RAMS_1.0/AMR-gs-master/stog/data/dataset_readers/amr_parsing/preprocess/text_anonymizor.py", line 61, in call
amr, text_map, max_length, anonym_type, pos_tag
File "/data/leon/LP4EE/RAMS_1.0/AMR-gs-master/stog/data/dataset_readers/amr_parsing/preprocess/text_anonymizor.py", line 83, in _abstract
collected_entities,
File "/data/leon/LP4EE/RAMS_1.0/AMR-gs-master/stog/data/dataset_readers/amr_parsing/preprocess/text_anonymizor.py", line 181, in _replace_span
if length == 1 and self._leave_as_is(start, amr, text_map, anonym_type):
File "/data/leon/LP4EE/RAMS_1.0/AMR-gs-master/stog/data/dataset_readers/amr_parsing/preprocess/text_anonymizor.py", line 143, in _leave_as_is
if is_anonym_type(index, amr, text_map, ['DATE_ATTRS']) and next_token_is(index, 1, amr, r"^''$"):
File "/data/leon/LP4EE/RAMS_1.0/AMR-gs-master/stog/data/dataset_readers/amr_parsing/preprocess/text_anonymizor.py", line 28, in is_anonym_type
return lemma in text_map and text_map[lemma]['ner'] in types
and the text_map
above was got by
text_anonymizor = TextAnonymizor.from_json( os.path.join(args.util_dir, "text_anonymization_rules.json"))
so i analysis the json file and i find there are two wrong key in this file:
"span": "Chinese"
in the anonym_type of named-entity
and
"span": "2002 08 28"
in the anonym_type of date-entity
so i delete them and the error above don't show again.
I guess the key of "span" was mis-written in the text_anonymization_rules.json
, is that right?
To reproduce, run ./preprocess.sh
on any sentence e.g. "Short span of time".
Error:
File "/stog/data/dataset_readers/amr_parsing/preprocess/text_anonymizor.py", line 26, in is_anonym_type
return lemma in text_map and text_map[lemma]['ner'] in types
TypeError: string indices must be integers
Reason: text_map[lemma]
turns out to be a string ('Chinese'
or '2002 08 28'
).
Fix:
"span": "Chinese"
from data/AMR/amr_2.0_utils/text_anonymization_rules.json
"span": "2002 08 28"
from data/AMR/amr_2.0_utils/text_anonymization_rules.json
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.