dair-iitd / openie6 Goto Github PK

OpenIE6 system

License: GNU General Public License v3.0

Python 95.22% HTML 1.60% Shell 0.17% Dockerfile 0.04% Makefile 0.18% Scilab 0.07% C 0.69% Perl 0.74% Jsonnet 1.29%

openie6's Introduction

OpenIE6 System

This repository contains the code for the paper:
OpenIE6: Iterative Grid Labeling and Coordination Analysis for Open Information Extraction
Keshav Kolluru*, Vaibhav Adlakha*, Samarth Aggarwal, Mausam and Soumen Chakrabarti
EMNLP 2020

* denotes equal contribution

Installation

conda create -n openie6 python=3.6
conda activate openie6
pip install -r requirements.txt
python -m nltk.downloader stopwords
python -m nltk.downloader punkt

All results have been obtained on V100 GPU with CUDA 10.0 NOTE: HuggingFace transformers==2.6.0 is necessary. The latest version has a breaking change in the way tokenizer is used in the code. It will not raise an error but will give wrong results!

Download Resources

Download Data (50 MB)

zenodo_get 4054476
tar -xvf openie6_data.tar.gz

Download Models (6.6 GB)

zenodo_get 4055395
tar -xvf openie6_models.tar.gz

Running Model

New command:

python run.py --mode splitpredict --inp sentences.txt --out predictions.txt --rescoring --task oie --gpus 1 --oie_model models/oie_model/epoch=14_eval_acc=0.551_v0.ckpt --conj_model models/conj_model/epoch=28_eval_acc=0.854.ckpt --rescore_model models/rescore_model --num_extractions 5

Expected models:
models/conj_model: Performs coordination analysis
models/oie_model: Performs OpenIE extraction
models/rescore_model: Performs the final rescoring

--inp sentences.txt - File with one sentence in each line --out predictions.txt - File containing the generated extractions

gpus - 0 for no GPU, 1 for single GPU

Additional flags -

--type labels // outputs word-level aligned labels to the file path `out`+'.labels'
--type sentences // outputs decomposed sentences to the file path `out`+'.sentences'

Additional Notes:

The model is trained with tokenized sentences and hence requires tokenized sentences during prediction as well. The code currently uses nltk tokenization for this purpose. This will lead to the final sentences being different from the input sentences, as they will be the tokenized version. If this is not desirable you can comment the nltk tokenization in data.py and make sure that your sentences are tokenized beforehand.
Due to an artifact of training data in conjunction model, it requires the sentence to end with full stop to function correctly.

Training Model

Warmup Model

Training:

python run.py --save models/warmup_oie_model --mode train_test --model_str bert-base-cased --task oie --epochs 30 --gpus 1 --batch_size 24 --optimizer adamW --lr 2e-05 --iterative_layers 2

Testing:

python run.py --save models/warmup_oie_model --mode test --batch_size 24 --model_str bert-base-cased --task oie --gpus 1

Carb F1: 52.4, Carb AUC: 33.8

Predicting

python run.py --save models/warmup_oie_model --mode predict --model_str bert-base-cased --task oie --gpus 1 --inp sentences.txt --out predictions.txt

Time (Approx): 142 extractions/second

Constrained Model

Training

python run.py --save models/oie_model --mode resume --model_str bert-base-cased --task oie --epochs 16 --gpus 1 --batch_size 16 --optimizer adam --lr 5e-06 --iterative_layers 2 --checkpoint models/warmup_oie_model/epoch=13_eval_acc=0.544.ckpt --constraints posm_hvc_hvr_hve --save_k 3 --accumulate_grad_batches 2 --gradient_clip_val 1 --multi_opt --lr 2e-5 --wreg 1 --cweights 3_3_3_3 --val_check_interval 0.1

Testing

python run.py --save models/oie_model --mode test --batch_size 16 --model_str bert-base-cased --task oie --gpus 1

Carb F1: 54.0, Carb AUC: 35.7

Predicting

python run.py --save models/oie_model --mode predict --model_str bert-base-cased --task oie --gpus 1 --inp sentences.txt --out predictions.txt

Time (Approx): 142 extractions/second

NOTE: Due to a bug in the code, link, we end up using a loss function based only on the constrained loss term and not the original Cross Entropy (CE) loss. It still seems to work well as the warmup model is already trained with the CE loss and the constrained training is initialized from the warmup model.

Running Coordination Analysis

python run.py --save models/conj_model --mode train_test --model_str bert-large-cased --task conj --epochs 40 --gpus 1 --batch_size 32 --optimizer adamW --lr 2e-05 --iterative_layers 2

F1: 87.8

Final Model

Running

python run.py --mode splitpredict --inp carb/data/carb_sentences.txt --out models/results/final --rescoring --task oie --gpus 1 --oie_model models/oie_model/epoch=14_eval_acc=0.551_v0.ckpt --conj_model models/conj_model/epoch=28_eval_acc=0.854.ckpt --rescore_model models/rescore_model --num_extractions 5 
python utils/oie_to_allennlp.py --inp models/results/final --out models/results/final.carb
python carb/carb.py --allennlp models/results/final.carb --gold carb/data/gold/test.tsv --out /dev/null

Carb F1: 52.7, Carb AUC: 33.7 Time (Approx): 31 extractions/second

Evaluate using other metrics (Carb(s,s), Wire57 and OIE-16)

bash carb/evaluate_all.sh models/results/final.carb carb/data/gold/test.tsv

Carb(s,s): F1: 46.4, AUC: 26.8 Carb(s,m) ==> Carb: F1: 52.7, AUC: 33.7 OIE16: F1: 65.6, AUC: 48.4 Wire57: F1: 40.0

CITE

If you use this code in your research, please cite:

@inproceedings{kolluru&al20,
    title = "{O}pen{IE}6: {I}terative {G}rid {L}abeling and {C}oordination {A}nalysis for {O}pen {I}nformation {E}xtraction",\
    author = "Kolluru, Keshav  and
      Adlakha, Vaibhav and
      Aggarwal, Samarth and
      Mausam, and
      Chakrabarti, Soumen",
    booktitle = "The 58th Annual Meeting of the Association for Computational Linguistics (ACL)",
    month = July,
    year = "2020",
    address = {Seattle, U.S.A}
}

LICENSE

Note that the license is the full GPL, which allows many free uses, but not its use in proprietary software which is distributed to others. For distributors of proprietary software, you can contact us for commercial licensing.

CONTACT

In case of any issues, please send a mail to keshav.kolluru (at) gmail (dot) com

openie6's People

Contributors

Stargazers

Watchers

openie6's Issues

Can you provide the original PTB dataset used in your work?

To compare with your work, I need the original PTB dataset [1] used in OpenIE6 model. But this data set can't be found on the Internet now. Can you provide one?

[1] Jessica Ficler, Yoav Goldberg: Coordination Annotation Extension in the Penn Tree Bank. ACL (1) 2016

Running as Server

Is it possible to run openie6 triple extraction as a server similar to how https://github.com/dair-iitd/OpenIE-standalone can?

Or alternatively, can I extract triples from a stream of documents without having to reload the model every time?

Training Warmup model fails

Hi there. The warmup training seems to fail. Error below

Validation sanity check:  50%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                                                                                                                                                                                                                                                                                 | 1/2 [00:00<00:00,  3.88it/s]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}
Epoch 1: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊| 3826/3827 [10:55<00:00,  5.84it/s, best=-inf, loss=1.060]
Results: {'eval_f1': 0, 'eval_auc': 0, 'eval_lastf1': 0}██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 27/27 [00:02<00:00, 11.42it/s]
Epoch 1: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3827/3827 [10:55<00:00,  5.84it/s, best=-inf, loss=1.060]
Epoch 00000: eval_acc reached 0.00000 (best 0.00000), saving model to models/warmup_oie_model/epoch=00_eval_acc=0.000.ckpt as top 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
INFO:lightning:
Epoch 00000: eval_acc reached 0.00000 (best 0.00000), saving model to models/warmup_oie_model/epoch=00_eval_acc=0.000.ckpt as top 1
Epoch 2:   0%|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | 0/3827 [00:00<?, ?it/s, best=-inf, loss=1.060]Traceback (most recent call last):
  File "run.py", line 469, in <module>
    main(hyperparams)
  File "run.py", line 459, in main
    train_dataloader, val_dataloader, test_dataloader, all_sentences)
  File "run.py", line 60, in train
    val_dataloaders=val_dataloader)
  File "/home//.conda/envs/openie_3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 859, in fit
    self.single_gpu_train(model)
  File "/home//.conda/envs/openie_3/lib/python3.7/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 503, in single_gpu_train
    self.run_pretrain_routine(model)
  File "/home//.conda/envs/openie_3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1015, in run_pretrain_routine
    self.train()
  File "/home//.conda/envs/openie_3/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 347, in train
    self.run_training_epoch()
  File "/home//.conda/envs/openie_3/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 419, in run_training_epoch
    _outputs = self.run_training_batch(batch, batch_idx)
  File "/home//.conda/envs/openie_3/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 638, in run_training_batch
    self.on_batch_end()
  File "/home//.conda/envs/openie_3/lib/python3.7/site-packages/pytorch_lightning/trainer/callback_hook.py", line 63, in on_batch_end
    callback.on_batch_end(self, self.get_model())
  File "/home//.conda/envs/openie_3/lib/python3.7/site-packages/pytorch_lightning/callbacks/progress.py", line 326, in on_batch_end
    self.main_progress_bar.set_postfix(**trainer.progress_bar_dict)
  File "/home//.conda/envs/openie_3/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 750, in progress_bar_dict
    return dict(**ref_model.get_progress_bar_dict(), **self.progress_bar_metrics)
  File "/home//openie6/model.py", line 263, in get_progress_bar_dict
    best = self.trainer.checkpoint_callback.kth_value.item()
AttributeError: 'int' object has no attribute 'item'
Exception ignored in: <function tqdm.__del__ at 0x7ff34b852b90>
Traceback (most recent call last):
  File "/home//.conda/envs/openie_3/lib/python3.7/site-packages/tqdm/std.py", line 1086, in __del__
  File "/home//.conda/envs/openie_3/lib/python3.7/site-packages/tqdm/std.py", line 1293, in close
  File "/home//.conda/envs/openie_3/lib/python3.7/site-packages/tqdm/std.py", line 1471, in display
  File "/home//.conda/envs/openie_3/lib/python3.7/site-packages/tqdm/std.py", line 1089, in __repr__
  File "/home//.conda/envs/openie_3/lib/python3.7/site-packages/tqdm/std.py", line 1433, in format_dict
TypeError: cannot unpack non-iterable NoneType object

Possible error when computing final loss

In the paper, the final loss is obtained by adding constrained loss to cross entropy:

But in model.py, it seems that only the constrained loss is used:

If I understand the paper correctly, should loss = const_loss be changed to loss += const_loss?

Question about IL embeddings in your paper

Hello,

openie6 is very interesting both in its performance and its approach to Open IE via grid labeling. Thank you for sharing your work.

In your paper, you only retain the embeddings of first word-piece for words that were broken into multiple word-pieces, instead of summing or using another technique to combine the word-piece embeddings.

Is there a particular reason why?

Thank you!

Dependencies conflict during installation

Hi and many thanks for the new Openie version.

I had some issues during installation on a clean environment. I followed the instructions for the creation of a new conda env and upon installing the reqs I ended up with unresolvable dependency conflicts.

The only way to fix this was to loosen the requirements from the requirements.txt file. Attached you can find the version that worked for me (cpu only).
requirements.txt

It would be useful to know if anyone else faces the same issue.

where is the carb/data/test_gold_allennlp_format.txt mentioned in carb/evaluat_all.sh ?

where is the carb/data/test_gold_allennlp_format.txt mentioned in carb/evaluat_all.sh

Installation Issue

OS: Windows 10
Base Environment: Ananconda

When I try to install the requirements.txt as per given instruction in the README file, I get the below error.

[Bug] `greenlet.greenlet size changed` error when calling run.py

Hi there,

I'm trying to get OpenIE6 running, but seem to be running into an error with gevent. See the traceback below...

  File "run.py", line 24, in <module>
    from imojie.aggregate.score import rescore
  File "imojie/imojie/aggregate/score.py", line 8, in <module>
    from allennlp.commands.evaluate import evaluate_from_args
  File "/home/ec2-user/mambaforge/envs/oie6/lib/python3.6/site-packages/allennlp/commands/__init__.py", line 8, in <module>
    from allennlp.commands.configure import Configure
  File "/home/ec2-user/mambaforge/envs/oie6/lib/python3.6/site-packages/allennlp/commands/configure.py", line 23, in <module>
    from gevent.pywsgi import WSGIServer
  File "/home/ec2-user/mambaforge/envs/oie6/lib/python3.6/site-packages/gevent/__init__.py", line 87, in <module>
    from gevent._hub_local import get_hub
  File "/home/ec2-user/mambaforge/envs/oie6/lib/python3.6/site-packages/gevent/_hub_local.py", line 101, in <module>
    import_c_accel(globals(), 'gevent.__hub_local')
  File "/home/ec2-user/mambaforge/envs/oie6/lib/python3.6/site-packages/gevent/_util.py", line 105, in import_c_accel
    mod = importlib.import_module(cname)
  File "/home/ec2-user/mambaforge/envs/oie6/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "src/gevent/__greenlet_primitives.pxd", line 12, in init gevent.__hub_local
ValueError: greenlet.greenlet size changed, may indicate binary incompatibility. Expected 128 from C header, got 40 from PyObject

Steps to Reproduce

conda create -n openie6 python=3.6
conda activate openie6
pip install -r requirements.txt
python -m nltk.downloader stopwords
python -m nltk.downloader punkt 
zenodo_get 4055395
tar -xvf openie6_models.tar.gz

# run model
python run.py --mode splitpredict --inp sentences.txt --out predictions.txt --rescoring --task oie --gpus 1 --oie_model models/oie_model/epoch=14_eval_acc=0.551_v0.ckpt --conj_model models/conj_model/epoch=28_eval_acc=0.854.ckpt --rescore_model models/rescore_model --num_extractions 5

Using a different dataset to train, validate, and test

Hello,

I'd like to train, validate, and test openie6 using a different dataset.

I've replaced the training set data/openie4_labels already, but am not sure how to replace validation and test sets?

My thought was to replace carb/data/test.txt and carb/data/dev.txt but noticed they do not contain ARG/REL tags but only NONE tags. I figured they were just being used to trigger inference, and validation and evaluation happens somewhere else.

Thanks in advance!

When task = ='conj ', there is a problem with the training data format

Thank you for looking at my question in your busy schedule. I want to know how the data format is generated and what tools are used when task = conj

error for downloading model

I tried to download the model by doing

zenodo_get 4055395

but i kept getting the following error:

Too many errors.
Download is aborted.

I am not sure how to fix this.

Conjunction analysis code

Where can I find the code for the conjunction analysis portion?

File Deserialization Error during Splitpredict Command Execution with Pretrained Model.

Hello there!

I appreciate you taking the time to consider my question amidst your busy schedule. I'm currently attempting to perform inference using your checkpoint (pretrained model). While I can successfully make predictions using the standard predict command (I obtain result without conjunctive sentence resolution), I encounter an error related to file serialization when utilizing the splitpredict command.

(openie6) C:\Users\PC\openie6-master>python run.py --mode splitpredict --inp carb/data/carb_sentences.txt --out models/results/final --rescoring --task oie --gpus 0 --oie_model models/oie_model/epoch=14_eval_acc=0.551_v0.ckpt --conj_model models/conj_model/epoch=28_eval_acc=0.854.ckpt --rescore_model models/rescore_model --num_extractions 5
WARNING:root:This caffe2 python run does not have GPU support. Will run in CPU only mode.
C:\Users\PC\anaconda3\envs\openie6\lib\site-packages\sklearn\utils\deprecation.py:143: FutureWarning: The sklearn.preprocessing.data module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.preprocessing. Anything that cannot be imported from sklearn.preprocessing is now part of the private API.
  warnings.warn(message, FutureWarning)
1282it [00:00, 3498.79it/s]
Traceback (most recent call last):
  File "run.py", line 470, in <module>
    main(hyperparams)
  File "run.py", line 460, in main
    train_dataloader, val_dataloader, test_dataloader, all_sentences)
  File "run.py", line 161, in splitpredict
    model = predict(hparams, None, meta_data_vocab, None, None, test_dataloader, all_sentences)
  File "run.py", line 132, in predict
    loaded_hparams_dict = torch.load(checkpoint_path, map_location=torch.device('cpu'))['hparams']
  File "C:\Users\PC\anaconda3\envs\openie6\lib\site-packages\torch\serialization.py", line 386, in load
    return _load(f, map_location, pickle_module, **pickle_load_args)
  File "C:\Users\PC\anaconda3\envs\openie6\lib\site-packages\torch\serialization.py", line 580, in _load
    deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)
OSError: [Errno 22] Invalid argument

If you suspect this is an IPython 7.16.1 bug, please report it at:
    https://github.com/ipython/ipython/issues
or send an email to the mailing list at [email protected]

You can print a more detailed traceback right now with "%tb", or use "%debug"
to interactively debug it.

Extra-detailed tracebacks for bug-reporting purposes can be enabled via:
    %config Application.verbose_crash=True

Is there a way to get dependency parse outputs in addition to OpenIE ones?

HI, I am trying to use both the OpenIE outputs and DP outputs for a sentence. Is it possible to get both from the same model/pipeline?

error in calculating loss?

model.py, line 182 reads:
loss = const_loss

Shouldn't this be
loss += const_loss

Missing allennlp.data

I'm following the install instructions to the letter, but I am persistently getting that allennlp.data is missing. What is the solution?

crash with curr_line_num referenced before assignment

run.py crashed on me with the following exception:

Traceback (most recent call last):
  File "run.py", line 469, in <module>
    main(hyperparams)
  File "run.py", line 459, in main
    train_dataloader, val_dataloader, test_dataloader, all_sentences)
  File "run.py", line 245, in splitpredict
    if curr_line_num not in no_extractions:
UnboundLocalError: local variable 'curr_line_num' referenced before assignment

Looking at the code, it looks like curr_line_num is only initialized when the for loop is executed at least once, which seems not to be true in my example.

meta_data_vocab comprises of sentences, not tokens

Hello,

It looks like meta_data_vocab used as an argument for model declaration is ... not in a format familiar to me? The vocabularies seem to be comprised of sentences, rather than tokens.

I attempted not providing meta_data_vocab as an input, since it seems to be an optional argument, but that also fails due to a snipper of code that invokes meta_data_vocab.itos.

>>> META_DATA.vocab.itos
['<unk>', 'A trial run run on this initialization sentence initializes the OpenIE6 open information extractor .']

>>> meta_data_vocab.itos
['<unk>', 'A trial run run on this initialization sentence initializes the OpenIE6 open information extractor .']

Is meta_data_vocab meant to look like this? I was trying to declare a model that could be used for predicting any given input text, but meta_data_vocab seems to prevent this, assigning each model to one specific predict_fp.

Much thanks!

How can I use data_processing.py to create my own datasets?

I would like to create a dataset using the same method like OpenIE4, but cannot find the file 'wiki.txt.openie4.processed'. Could it be released?

Error on gpus set to 0

Although i set gpus = 0 since i dont have any GPU installed scripted did predicted the output and saved it in prediction.txt file but right before exiting run.py file compiler throughs this error

Starting re-scoring ...

{'<arg1>': '[unused1]', '</arg1>': '[unused2]', '<rel>': '[unused3]', '</rel>': '[unused4]', '<arg2>': '[unused5]', '</arg2>': '[unused6]', 'SENT': '[unused7]', 'PRED': '[unused8]', '@COPY@': '[unused9]', 'EOE': '[unused10]'}
Traceback (most recent call last):
  File "run.py", line 469, in <module>
    main(hyperparams)
  File "run.py", line 459, in main
    train_dataloader, val_dataloader, test_dataloader, all_sentences)
  File "run.py", line 255, in splitpredict
    rescored = rescore(inp_fp, model_dir=hparams.rescore_model, batch_size=256)
  File "imojie/imojie/aggregate/score.py", line 90, in rescore
    return generate_probs(model_dir, inp_fp, weights_fp, topk, out_ext, cuda_device, overwrite=overwrite, extraction_ratio=ext_ratio, batch_size=batch_size, out=None)
  File "imojie/imojie/aggregate/score.py", line 39, in generate_probs
    probs = evaluate_from_args(args)
  File "/home/arsalan/Documents/openie6/topic/lib/python3.6/site-packages/allennlp/commands/evaluate.py", line 131, in evaluate_from_args
    archive = load_archive(args.archive_file, args.cuda_device, args.overrides, args.weights_file)
  File "/home/arsalan/Documents/openie6/topic/lib/python3.6/site-packages/allennlp/models/archival.py", line 230, in load_archive
    cuda_device=cuda_device)
  File "/home/arsalan/Documents/openie6/topic/lib/python3.6/site-packages/allennlp/models/model.py", line 329, in load
    return cls.by_name(model_type)._load(config, serialization_dir, weights_file, cuda_device)
  File "/home/arsalan/Documents/openie6/topic/lib/python3.6/site-packages/allennlp/models/model.py", line 277, in _load
    model_state = torch.load(weights_file, map_location=util.device_mapping(cuda_device))
  File "/home/arsalan/Documents/openie6/topic/lib/python3.6/site-packages/torch/serialization.py", line 386, in load
    return _load(f, map_location, pickle_module, **pickle_load_args)
  File "/home/arsalan/Documents/openie6/topic/lib/python3.6/site-packages/torch/serialization.py", line 573, in _load
    result = unpickler.load()
  File "/home/arsalan/Documents/openie6/topic/lib/python3.6/site-packages/torch/serialization.py", line 536, in persistent_load
    deserialized_objects[root_key] = restore_location(obj, location)
  File "/home/arsalan/Documents/openie6/topic/lib/python3.6/site-packages/torch/serialization.py", line 409, in restore_location
    result = map_location(storage, location)
  File "/home/arsalan/Documents/openie6/topic/lib/python3.6/site-packages/allennlp/nn/util.py", line 828, in inner_device_mapping
    return storage.cuda(cuda_device)
  File "/home/arsalan/Documents/openie6/topic/lib/python3.6/site-packages/torch/_utils.py", line 69, in _cuda
    with torch.cuda.device(device):
  File "/home/arsalan/Documents/openie6/topic/lib/python3.6/site-packages/torch/cuda/__init__.py", line 243, in __enter__
    self.prev_idx = torch._C._cuda_getDevice()
  File "/home/arsalan/Documents/openie6/topic/lib/python3.6/site-packages/torch/cuda/__init__.py", line 178, in _lazy_init
    _check_driver()
  File "/home/arsalan/Documents/openie6/topic/lib/python3.6/site-packages/torch/cuda/__init__.py", line 99, in _check_driver
    http://www.nvidia.com/Download/index.aspx""")
AssertionError: 
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx

If you suspect this is an IPython 7.16.1 bug, please report it at:
    https://github.com/ipython/ipython/issues
or send an email to the mailing list at [email protected]

You can print a more detailed traceback right now with "%tb", or use "%debug"
to interactively debug it.

Extra-detailed tracebacks for bug-reporting purposes can be enabled via:
    %config Application.verbose_crash=True

Does this handle n-ary relationship? as done by previous openie systems.

Is the manual evaluation available?

Hi, I know I'm really late to this but is there any chance the authors can make the human evaluation results available? It seems it's not included in the repo. Thanks in advance!

Roadmap towards openie7

Context: I am a software engineer working on pushing the boundaries of natural language semantic parsing. To do such a task I critically need the state of the art of information extraction.

So I skimmed through your paper, it seems to have great ideas and seems to be the new state of the art. Do you think there is a new paper that outperform openie6?

You should probably update your paper by adding the results of predpatt https://github.com/hltcoe/PredPatt
Predpatt seems like a paradigm shift in information extraction, by exploiting Universal Dependencies
Predpatt didn't get much human resources and is now in maintenance mode but I do believe that a future openie7 could gain to use universal dependencies insights to some extent.
Especially by making use of the Enhanced Universal Dependencies which predpatt does not make use of!
So this is my main idea toward improving openie.

The second one can already be done today and has 100% chance of success:
I believe that openie6 is making use of BERT, if that's the case then you should switch to XLnet instead which is the real state of the art pretrained language model.
I'd advise the lib Transformers to do that
https://github.com/huggingface/transformers
It could yield significant accuracy improvments.

I and humanity really need better information extraction, please improve the SOTA once again!

@SaiKeshav

Unable to train constraint model

Hello,

Thank you for sharing you work. I am trying to train the model myself but am stuck at the constrained model step:

python run.py --save models/oie_model --mode resume --model_str bert-base-cased --task oie --epochs 16 --gpus 1 --batch_size 16 --optimizer adam --lr 5e-06 --iterative_layers 2 --checkpoint models/warmup_oie_model/epoch=15_eval_acc=0.485.ckpta --constraints posm_hvc_hvr_hve --save_k 3 --accumulate_grad_batches 2 --gradient_clip_val 1 --multi_opt --lr 2e-5 --wreg 1 --cweights 3_3_3_3 --val_check_interval 0.1

It looks like the model doesn't train under 'resume' mode. Validation sanity check is passed, but as trainer.fit() is called, it immediately exits with the following log output:

Validation sanity check: 100%|##########################################| 5/5 [00:00<00:00,  7.83it/s]
Results: {'eval_f1': 0.094, 'eval_auc': 0.0362, 'eval_lastf1': 0.094}
Training: 0it [00:00, ?it/s]

The TFevents file offers the uninformative following:

Processing event files... (this can take a few minutes)
======================================================================

These tags are in events.out.tfevents.1634007290.34a1b7eb1401.3870.0:
audio -
histograms -
images -
scalars -
tensor -
======================================================================

Event statistics for events.out.tfevents.1634007290.34a1b7eb1401.3870.0:
audio -
graph -
histograms -
images -
scalars -
sessionlog:checkpoint -
sessionlog:start -
sessionlog:stop -
tensor -
======================================================================

With --mode train_test, the same command and parameter successfully trains. I'd appreciate any help I can get.

Thank you!

dair-iitd / openie6 Goto Github PK

openie6's Introduction

OpenIE6 System

Installation

Download Resources

Running Model

Training Model

Warmup Model

Constrained Model

Running Coordination Analysis

Final Model

CITE

LICENSE

CONTACT

openie6's People

Contributors

Stargazers

Watchers

Forkers

openie6's Issues

Recommend Projects

Recommend Topics

Recommend Org