sheng-z / stog Goto Github PK

View Code? Open in Web Editor NEW

154.0 154.0 36.0 202 KB

AMR Parsing as Sequence-to-Graph Transduction

License: MIT License

Shell 0.90% Python 99.10%

abstract-meaning-representation acl2019 amr nlp pytorch semantic-parsing

stog's People

Contributors

Stargazers

Watchers

stog's Issues

Question Regarding GPU-requirement

Hi!

With your statement in the README, "Make sure that you have at least two GeForce GTX TITAN X GPUs to train the full model.", do you mean that you require this setup specifically, or do you just require GPU-memory equivalent to, or greater than, two GeForce GTX TITAN X GPUs? Or is there some other interpretation/requirement completely?

Thanks in advance,
Simon

Training on AMR 3.0 (LDC2020T02)

Has anyone tried to train on the new, larger, AMR 3.0 corpus? Any plans to do so? I'm curious if this improves the overall score. I'm tempted to dig into this but I only have a single TitanX card to train on and the corpus isn't readily available to me. If someone else is already working on this I may prefer to wait to hear your results rather than dig into it myself. Let us know.

Add Glove Indo and FastText to artifact

Add More to Artifact and capability to choose what artifact to use

Upload Glove Indo ke GCS
Simpan file-file tersebut ke github untuk di index
Tambahkan ke artifact
Cite juga fasttext untuk artifact

Progress 20210824

Wrong post

Are the source-side and target-side vocabularies shared?

If not, how to add the src-side and tgt-side attention prob distributions together?

Pretrained Model

is there any chance you will release a trained version of the
model?

Named entity score.

Hi Sheng,

Thanks for your nice work. The conversion to dependency parsing is really insightful.

One question, I find your named entity metric to be low, and yet have a wikification accuracy higher than that of named entity score. Do you know what is happening? Are you having low named entity recall or there are issues with surface string processing?

Chunchuan

bug

how to parsing the DailyDialog

Excuse me,
I want to parse another file, which including some sentences, the example as follows：

Can you do push-ups ?
Of course I can . It's a piece of cake ! Believe it or not , I can do 30 push-ups a minute .
Really ? I think that's impossible !
You mean 30 push-ups ?
It's easy . If you do exercise everyday , you can make it , too .
I'm so appreciate that you can give me a advice! Thanks!

penman.EncodeError: Invalid graph; possibly disconnected.

Hi,

during prediction I met this error, have you met it before?

File "/home/yaosw/anaconda3/envs/stog/lib/python3.6/site-packages/penman.py", line 250, in encode
return self._encode_penman(g, top=top)
File "/home/yaosw/anaconda3/envs/stog/lib/python3.6/site-packages/penman.py", line 493, in _encode_penman
raise EncodeError('Invalid graph; possibly disconnected.')
penman.EncodeError: Invalid graph; possibly disconnected.

Pin Penman version in requirements

Hi, I noticed you're using the Penman package in your code. I expect that the next version will break some backward compatibility, so I recommend you pin the current latest version in your requirements.txt:

penman==0.6.2

How to generate AMR from a file

Hi, Thanks for your nice work. and I want to realize a simple task

If I just have a file as follows:

sentence1
sentence2
....

And how to use this pre-train model to generate AMR of these sentences directly..

How to generate the "amr_2.0_utils" on my own data

Hi Sheng,

Thanks for your nice work. Can you offer the scripts or methods to generate the "amr_2.0_utils" on the other dataset? Thank you very much!

Type-o in README

Currently is: ./script/postprocess_2.0.sh test.pred.txt
Should be: ./scripts/postprocess_2.0.sh test.pred.txt

Prediction with only one input sentence fails

At the prediction step, if the input file contains only one sentence, it results in an error. For example, after all the preprocess was applied to the AMR 2.0 (LDC2017T10) data (following the steps in README.md), there will be a preprocessed test data file test.preproc in data/AMR/amr_2.0/ directory. However, if I removed all the "paragraphs" (that is, a pair of "comments" (including id, snt, and tags etc.) and AMR graph) except for the first paragraph from the file, and then apply the "6. Prediction" phase, the program shows the following error:

Traceback (most recent call last):
  File ".../anaconda/envs/stog/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File ".../anaconda/envs/stog/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File ".../stog/stog/commands/predict.py", line 252, in <module>
    _predict(args)
  File ".../stog/stog/commands/predict.py", line 208, in _predict
    manager.run()
  File ".../stog/stog/commands/predict.py", line 182, in run
    for model_input_instance, result in zip(batch, self._predict_instances(batch)):
  File ".../stog/stog/commands/predict.py", line 143, in _predict_instances
    results = [self._predictor.predict_instance(batch_data[0])]
  File ".../stog/stog/predictors/predictor.py", line 46, in predict_instance
    outputs = self._model.forward_on_instance(instance)
  File ".../stog/stog/models/model.py", line 117, in forward_on_instance
    raise NotImplementedError
NotImplementedError

This error occurs whenever input file with only one AMR paragraph was given to --input-file option of stog.commands.predict module. Also, if the input file contains more than two paragraphs, this error doesn't seem to be happening.

evaluation.sh requires python2

I tried to run the evaluation script (tools/amr-evaluation-tool-enhanced/evaluation.sh), but when it was run in a environment without Python 2 (or, there doesn't exist an executable python2 in PATH), it results in a error like "python2: command not found". This message was also shown during the training. Could you write that the evaluation needs Python 2 in the README.md?

Occasional problem with training on subsets of AMR 2.0

I'm trying to train the AMR parser on subsets of AMR 2.0 in order to see how performance scales with the number of training examples. Normally, it works just fine. But, for some subsets, I end up encountering errors. Below is the output I get when training using just 75% of the AMR 2.0 training data. I've attached a list of the training example ids in case that it is helpful.

#Step 1
python -u -m stog.commands.train data/subsets/train_75_s0/config.yaml &> RUN75_STEP1.log

#Step 2
python -u -m stog.commands.predict --archive-file data/subsets/train_75_s0/ckpt-dir
--weights-file data/subsets/train_75_s0/ckpt-dir/best.th
--input-file data/subsets/train_75_s0/test.txt.features.preproc
--batch-size 32
--use-dataset-reader
--cuda-device 0
--output-file data/subsets/train_75_s0/test.pred.txt
--silent
--beam-size 5
--predictor STOG
#Step 3
./scripts/postprocess_2.0.sh data/subsets/train_75_s0/test.pred.txt

#Step 4
./scripts/compute_smatch.sh data/subsets/train_75_s0/test.pred.txt data/AMR/amr_2.0/test.txt

#Step 4 output
Error in parsing AMR (vv1 / beyond:domain (vv2 / suppose-02:ARG1 (vv3 / i):ARG2 (vv4 / keep-01:ARG0 vv3:ARG1 (vv6 / horse:mod 1/
Traceback (most recent call last):
File "smatch/smatch.py", line 837, in
main(args)
File "smatch/smatch.py", line 737, in main
amr1.rename_node(prefix1)
AttributeError: 'NoneType' object has no attribute 'rename_node'
Smatch -> P: , R: , F:
Error in parsing AMR (vv1 / beyond:label (vv2 / suppose-02:label (vv3 / i):label (vv4 / keep-01:label vv3:label (vv6 / horse:label 1/
Traceback (most recent call last):
File "smatch/smatch.py", line 837, in
main(args)
File "smatch/smatch.py", line 737, in main
amr1.rename_node(prefix1)
AttributeError: 'NoneType' object has no attribute 'rename_node'
Unlabeled -> P: , R: , F:
Error in parsing AMR (vv1 / beyond:domain (vv2 / suppose-01:ARG1 (vv3 / i):ARG2 (vv4 / keep-01:ARG0 vv3:ARG1 (vv6 / horse:mod 1/
Traceback (most recent call last):
File "smatch/smatch.py", line 837, in
main(args)
File "smatch/smatch.py", line 737, in main
amr1.rename_node(prefix1)
AttributeError: 'NoneType' object has no attribute 'rename_node'
No WSD -> P: , R: , F:
Error in parsing AMR (vv1 / beyond :domain (vv2 / suppose-02 :ARG1 (vv3 / i) :ARG2 (vv4 / keep-01 :ARG0 vv3 :ARG1 (vv6 / horse :mod 1/
Traceback (most recent call last):
File "scores.py", line 121, in
dict_pred = var2concept(amr_pred)
File "scores.py", line 103, in var2concept
for n, v in zip(amr.nodes, amr.node_values):
AttributeError: 'NoneType' object has no attribute 'nodes'

training_example_ids_list.txt

Error of preprocessing with CoreNLP

Hi Sheng,

Please see the attached log for more details.

(stog) lfsong@c47:tool.amr_parsing_stog$ ./scripts/annotate_features.sh data/AMR/amr_2.0
[2019-09-05 06:18:53,869 INFO] Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .
[2019-09-05 06:18:53,885 INFO] Processing data/AMR/amr_2.0/test.txt
Traceback (most recent call last):
  File "/data/home/lfsong/anaconda3/envs/stog/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/data/home/lfsong/anaconda3/envs/stog/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/data2/lfsong/tool.amr_parsing_stog/stog/data/dataset_readers/amr_parsing/preprocess/feature_annotator.py", line 205, in <module>
    annotation = annotator(amr.sentence)
  File "/data2/lfsong/tool.amr_parsing_stog/stog/data/dataset_readers/amr_parsing/preprocess/feature_annotator.py", line 75, in __call__
    annotation = self.annotate(text)
  File "/data2/lfsong/tool.amr_parsing_stog/stog/data/dataset_readers/amr_parsing/preprocess/feature_annotator.py", line 63, in annotate
    tokens = self.nlp.annotate(text.strip(), self.nlp_properties)['sentences'][0]['tokens']
TypeError: string indices must be integers

how to generate the AMR based on other dataset like WMT14 or iwslt2014 ,etc.

Firstly,Thanks for your sharing , it's really helpful. But could I know how to generate the semantic graphs based on other dataset like WMT14 or iwslt2014 ,etc.?thanks in advance！

Using pre-trained models

Can anyone provide me with a script about using the per-trained models? What do you mean by "To use them for prediction, simply download & unzip them, and then run Step 6-8"?

File name of annotate_features.sh is wrong

In README it's ./scripts/annotate_features.sh, but actually the file name is annnotate_features.sh.

How to predict AMR graph for arbitrary sentences?

Let me ask a question. Is there any way to predict an AMR graph for the sentences not in AMR corpus? The README describes how to run predictions on the test sentences in AMR corpus, but I couldn't find ways to apply it to another sentence, using the trained parameters. As far as I can see, the prediction consists of several steps: given any AMR files, do feature annotations (step 3.), apply preprocess to the graph (step 4.), and do the prediction (step 6.). However, according to these steps, it seems that a sentence needs to be accompanied with AMR graph. Am I misunderstanding something? Is there any way to do prediction on any other sentences, either on command line or calling the Python codes?

details of the code

hello, I have some problem in code, would you tell me what is the source_dynamic_vocab_size and target_dynamic_vocab_size in the code. Thanks a lot

sheng-z / stog Goto Github PK

stog's People

Contributors

Stargazers

Watchers

Forkers

stog's Issues

Add More to Artifact and capability to choose what artifact to use

Recommend Projects

Recommend Topics

Recommend Org