stevezheng23 / xlnet_extension_tf Goto Github PK

XLNet Extension in TensorFlow

License: Apache License 2.0

Python 94.56% Shell 5.44%

artificial-intelligence natural-language-understanding natural-language-processing deep-learning machine-learning xlnet xlnet-ner xlnet-nlu xlnet-mrc xlnet-extension

xlnet_extension_tf's People

Contributors

Stargazers

Watchers

xlnet_extension_tf's Issues

Is there a pre-trained NER model

I don't have enough resources to train a XLNet NER model. Is there any open source model for XLNet NER.

TypeError: expected str, bytes or os.PathLike object, not FlagValues

I am trying to execute run_coqa.sh file with this command:

bash run_coqa.sh \
--gpudevice=0 \
--numgpus=1 \
--taskname=coqa \
--randomseed=100 \
--predicttag=xxxxx \
--modeldir=./model/xlnet_cased_L-24_H-1024_A-16 \
--datadir=./data \
--outputdir=./output_folder \
--numturn=100 \
--seqlen=512 \
--querylen=128 \
--answerlen=128 \
--batchsize=8 \
--learningrate=3e-5 \
--trainsteps=2000 \
--warmupsteps=120 \
--savesteps=500 \
--answerthreshold=128

Then I got this error:

I0819 21:56:00.241106 140032917555008 estimator.py:1147] Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
I0819 21:56:00.242035 140032917555008 basic_session_run_hooks.py:541] Create CheckpointSaverHook.
Traceback (most recent call last):
  File "run_coqa.py", line 1777, in <module>
    tf.app.run()
  File "/home/huytran/miniconda3/envs/TF/lib/python3.7/site-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/home/huytran/miniconda3/envs/TF/lib/python3.7/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/huytran/miniconda3/envs/TF/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "run_coqa.py", line 1716, in main
    estimator.train(input_fn=train_input_fn, max_steps=FLAGS.train_steps)
  File "/home/huytran/miniconda3/envs/TF/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py",
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/home/huytran/miniconda3/envs/TF/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py",
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/home/huytran/miniconda3/envs/TF/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py",
    saving_listeners)
  File "/home/huytran/miniconda3/envs/TF/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py",ec
    scaffold=estimator_spec.scaffold)
  File "/home/huytran/miniconda3/envs/TF/lib/python3.7/site-packages/tensorflow/python/training/basic_session_run_hooks.p
    self._save_path = os.path.join(checkpoint_dir, checkpoint_basename)
  File "/home/huytran/miniconda3/envs/TF/lib/python3.7/posixpath.py", line 80, in join
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not FlagValues
execution time was 57 s.

Do you guys know how to solve this?

What does label.vocab contains

Just to test the implementation, I took the conll2003 data from this link https://github.com/synalp/NER/tree/master/corpus/CoNLL-2003 and then added a resource/label.vocab file in ${DATADIR}. My label.vocab file contains following entities:
I-PER
B-PER
I-LOC
B-LOC
I-MISC
B-MISC
I-ORG
B-ORG

But when I run the training script, it gives me following error:
Traceback (most recent call last):
File "run_ner.py", line 855, in
tf.app.run()
File "/home/falak/.local/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/home/falak/.local/lib/python3.5/site-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/home/falak/.local/lib/python3.5/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "run_ner.py", line 786, in main
train_features = example_converter.convert_examples_to_features(train_examples)
File "run_ner.py", line 381, in convert_examples_to_features
feature = self.convert_single_example(example, logging=(idx < 5))
File "run_ner.py", line 327, in convert_single_example
label_ids.append(self.label_map[labels[i]])
KeyError: 'O'

This error disappears if I add following additional entities in label.vocab:

O
X
<sep>
<unk>
<s>
</s>
<cls>
<sep>
<pad>
<mask>
<eod>
<eop>

So, is it expected to add these entities in label.vocab file?

Question about InputFeature generation in coqa

Thanks for uploading good and readable code and experiment setting, result.

btw, i have question about CoQA InputFeature generation.
In inputFeature generation code,
I think that your code seems to assume that doc span has always rationale to answer Free-form answers.

That means, sometimes, when doc span has no clue to answer free-form type question, it can be labeled incorrectly.
Is it intended? or Is there anything else I haven't understood?
Thanks for your work and have a good day!

About QuAC results

Hi, thank you so much for the code you share, I run the code for QuAC, but only get F1 score around 60, still have a large gap form the result you mentioned. I tried the same parameters you mentioned, except for batch_size, my machine can only support bs=4. Is there something more I need to pay attention to? Can you give me some advice about this? Thanks! :P

FileNotFoundError: data path not found: data/conll2003/resource/label.vocab

Cannot find this file. Also, what's the usage of run_embed.py?

Is there an example to use adversarial training on coqa?

hi, thanks for your hard work! Do you known how to add an adversarial training when finetuneing on the coqa dataset?

How to train a model with randomly initialized weights

I wanna train a ner model without pretrained weights. I remove --init_checkpoint from the command and execute it, but I get error message "--init_checkpoint must have a value other than None".
How should I do ? thanks

hard to get f1 value

I try to run on base size XLnet，128seq len,32 bsz and 2000times. but I can only get 91.3 f1 with conlleval perl version. is it right？

Where is tool/eval_quac.py

missing file?

Is there a pre-trained model trained on coQA available

I want to use pre-trained question answering model trained on coQA like hugging face. Do we have something available?

can xlnet support no fixed length data? input may be a long context

Can’t run CoQa train script in Google Colab

I tried using Google Colab CPU and GPU notebooks to train XLNET on COQA, but they keep crashing because of the ’out-of-memory’ issues. I tried reducing batch size to 1, but the problem still persists. Did anyone else face similar issues and was able to solve it?

How do I run ner on other language like chinese?

I have pretrained xlnet on a large chinese corpus, but how do I run the ner.py and what is label.vocab.
Here is my parameters to train the Sentence Piece model

spm_train \
	--input=data/wiki_all.txt \
	--model_prefix=sp10m.cased.v3 \
	--vocab_size=32000 \
	--character_coverage=0.9995 \
	--model_type=char \
	--control_symbols='<cls>,<sep>,<pad>,<mask>,<eod>' \
	--user_defined_symbols='<eop>,。' \
	--shuffle_input_sentence \
	--input_sentence_size=10000000

This my pretrained result.

I0708 01:51:08.929747 140337454118720 train_gpu.py:300] [99500] | gnorm 5.37 lr 0.000000 | loss 2.08 | pplx    8.01, bpc  3.0017
I0708 01:52:52.577970 140337454118720 train_gpu.py:300] [99600] | gnorm 4.98 lr 0.000000 | loss 2.03 | pplx    7.60, bpc  2.9265
I0708 01:54:36.169189 140337454118720 train_gpu.py:300] [99700] | gnorm 5.21 lr 0.000000 | loss 2.04 | pplx    7.73, bpc  2.9500
I0708 01:56:19.727979 140337454118720 train_gpu.py:300] [99800] | gnorm 5.06 lr 0.000000 | loss 2.05 | pplx    7.79, bpc  2.9625
I0708 01:58:03.187680 140337454118720 train_gpu.py:300] [99900] | gnorm 5.06 lr 0.000000 | loss 2.01 | pplx    7.47, bpc  2.9009
I0708 01:59:46.560450 140337454118720 train_gpu.py:300] [100000] | gnorm 5.51 lr 0.000000 | loss 2.00 | pplx    7.38, bpc  2.8840

So the label.vocabshould be like this ?

<cls>
<sep>
<pad>
<mask>
<eod>
B-AnatomyPart
I-AnatomyPart
B-Diagnosis
I-Diagnosis
B-Drug
I-Drug
B-Lab
I-Lab
B-Procedure
I-Procedure
B-Radiology
I-Radiology
O

AttributeError: 'module' object has no attribute 'XLNetConfig'

Thanks for your work on xlnet extension! It is quite impressive with how quickly this has been done.
I have a question with importing xlnet package. When I tried running NER experiment, I got an error in line 753 of run_ner.py,
saying that "AttributeError: 'module' object has no attribute 'XLNetConfig'".
Do you have any idea why xlnet.XLNetConfig is not successfully imported here?

Thanks in advance for your answer!

Evaluation and Prediction

Hi,

I was trying to run the NER task on a customized dataset. The training process was successful. However, when it went to evaluation and prediction step, the program stuck at INFO:tensorflow:Done running local_init_op. and not moving forward. Is there any potential fix on this problem?

Here is the log

INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
2020-04-14 20:00:54.378068: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2020-04-14 20:00:54.378117: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-14 20:00:54.378127: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2020-04-14 20:00:54.378135: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2020-04-14 20:00:54.378210: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23077 MB memory) -> physical GPU (device: 0, name: Quadro P6000, pci bus id: 0000:02:00.0, compute capability: 6.1)
INFO:tensorflow:Restoring parameters from output/ner/i2b2/checkpoint/model.ckpt-100
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.

ResourceExhaustedError: OOM when allocating tensor of shape [32000,768]

I got OOM error when running with conll2003 dataset using my 12G memory GPU. How could I solve this problem?
ResourceExhaustedError (see above for traceback): OOM when allocating tensor of shape [32000,768] and type float [[node model/transformer/word_embedding/lookup_table/Adam/Initializer/zeros (defined at xlnet/model_utils.py:164) = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [32000,768] values: [0 0 0...]...>, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]

stevezheng23 / xlnet_extension_tf Goto Github PK

xlnet_extension_tf's People

Contributors

Stargazers

Watchers

Forkers

xlnet_extension_tf's Issues

Recommend Projects

Recommend Topics

Recommend Org