Comments (5)
Hi @Wangpeiyi9979 !
This requires a bit of edits to the code as it is right now. This will not be an exhaustive list of things you might need to do, but a few tips:
- Add the pre-trained model to /data/AMR.
- Edit the pre-traied model's config.json-file as the filepaths in it are wrong. They should all begin with /data.
- Make some code to convert those sentences into the following format:
# AMR release; corpus: lpp; section: dev; number of AMRs: 3 (generated on Fri Nov 1, 2019 at 21:03:52)
# ::id lpp_1943.1 ::date 2019-11-18 14:58:12.957282 ::annotator Annotator ::preferred
# ::snt This is a sentence.
# ::save-date 2019-11-12 12:24:17.523046
(d / dummy)
# ::id separator_id_of_result ::date 2019-11-18 14:58:12.957282 ::annotator Blitzy ::preferred
# ::snt This is the next sentence.
# ::save-date 2019-11-12 12:24:17.523046
(d / dummy)
- Then you should be able to run it all by doing the following sequence: Your code that converts the sentence-file to AMR-format, prepare-data, feature-annotation, data-preprocessing, data-postprocessing.
Best of luck! :)
from stog.
Here's a simple script to convert sentences to an AMR format.
infn = 'data/sents.txt'
outfn = 'data/sents.txt.amr'
# Load the file
print('Reading ', infn)
sents = []
with open(infn) as f:
for line in f.readlines():
line = line.strip()
if not line: continue
sents.append(line)
# Create a dummy amr file
print('Writing ', outfn)
with open(outfn, 'w') as f:
for i, sent in enumerate(sents):
f.write('# ::id sents_id.%d\n' % i)
f.write('# ::snt %s\n' % sent)
f.write('(d / dummy)\n')
f.write('\n')
After you do this you still need to...
- Run the feature annotator script on the file. This is Readme step 3. You'll need to edit the script and remove {dev_data} and {train_data} and replace test_data with the name of your new file.
- Run the preprocessing script on it (Readme step 4) with similar mods to script as above, Note that the script only runs the "text_anonymizer" on the test data. The "recategorizer" does not need to run since it's only run on the train and dev data.
- Run the model to do prediction (Readme step 6)
- Run the post-processor (Readme step 7) - I would recommend commenting out the "Wikification" section. It's a little more complicated to get this working and the online Spolitlight server that the script uses is very unreliable.
from stog.
The annotator script (see readme 3. step) creates the tokens, lemmas,... tags. This uses the Stanford NLP system. Using NLTK to annotate will likely give you less than optimal results since the internals are all setup to work with the Stanford Named-Entity tags, not NLTK's (and NLTK is a fairly poor parser).
Also, don't forget to run the other pre-processing step before generating, otherwise things won't generate correctly (even though it will probably run without an error).
from stog.
@SimonWesterlind Thanks for answering, if you have script to convert the sentence to the given format, please share it with us :)
from stog.
@bjascob Thanks for the script! :) 👍, I have to edit the script to make it work, you can view that script at https://github.com/gauravghati/stog/blob/master/scripts/sentence-amr.py
The format needed for the input is:
# ::id sents_id.4
# ::snt Zero is a beautiful number.
# ::tokens ['Zero', 'is', 'a', 'beautiful', 'number', '.']
# ::lemmas ['Zero', 'is', 'a', 'beautiful', 'number', '.']
# ::pos_tags ['NNP', 'VBZ', 'DT', 'JJ', 'NN', '.']
# ::ner_tags ['GPE', 'O', 'O', 'O', 'O', 'O']
(d / dummy)
# ::id sents_id.1
# ::snt But that did not really surprise me much .
# ::tokens ['But', 'that', 'did', 'not', 'really', 'surprise', 'me', 'much', '.']
# ::lemmas ['But', 'that', 'did', 'not', 'really', 'surprise', 'me', 'much', '.']
# ::pos_tags ['CC', 'DT', 'VBD', 'RB', 'RB', 'VB', 'PRP', 'JJ', '.']
# ::ner_tags ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']
(d / dummy)
tokens, lemmas, pos_tag, and ner_tag is also needed for the input!
from stog.
Related Issues (20)
- Question Regarding GPU-requirement HOT 2
- Type-o in README
- penman.EncodeError: Invalid graph; possibly disconnected. HOT 2
- bug
- details of the code
- Occasional problem with training on subsets of AMR 2.0
- how to parsing the DailyDialog
- Training on AMR 3.0 (LDC2020T02) HOT 1
- how to generate the AMR based on other dataset like WMT14 or iwslt2014 ,etc.
- Are the source-side and target-side vocabularies shared?
- Progress 20210824
- Add Glove Indo and FastText to artifact
- Using pre-trained models
- evaluation.sh requires python2 HOT 1
- How to predict AMR graph for arbitrary sentences? HOT 10
- Prediction with only one input sentence fails HOT 1
- Pretrained Model HOT 1
- Pin Penman version in requirements HOT 1
- Error of preprocessing with CoreNLP HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stog.