stevezheng23 / xlnet_extension_tf Goto Github PK
View Code? Open in Web Editor NEWXLNet Extension in TensorFlow
License: Apache License 2.0
XLNet Extension in TensorFlow
License: Apache License 2.0
I don't have enough resources to train a XLNet NER model. Is there any open source model for XLNet NER.
I am trying to execute run_coqa.sh
file with this command:
bash run_coqa.sh \
--gpudevice=0 \
--numgpus=1 \
--taskname=coqa \
--randomseed=100 \
--predicttag=xxxxx \
--modeldir=./model/xlnet_cased_L-24_H-1024_A-16 \
--datadir=./data \
--outputdir=./output_folder \
--numturn=100 \
--seqlen=512 \
--querylen=128 \
--answerlen=128 \
--batchsize=8 \
--learningrate=3e-5 \
--trainsteps=2000 \
--warmupsteps=120 \
--savesteps=500 \
--answerthreshold=128
Then I got this error:
I0819 21:56:00.241106 140032917555008 estimator.py:1147] Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
I0819 21:56:00.242035 140032917555008 basic_session_run_hooks.py:541] Create CheckpointSaverHook.
Traceback (most recent call last):
File "run_coqa.py", line 1777, in <module>
tf.app.run()
File "/home/huytran/miniconda3/envs/TF/lib/python3.7/site-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/home/huytran/miniconda3/envs/TF/lib/python3.7/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/home/huytran/miniconda3/envs/TF/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "run_coqa.py", line 1716, in main
estimator.train(input_fn=train_input_fn, max_steps=FLAGS.train_steps)
File "/home/huytran/miniconda3/envs/TF/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py",
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/home/huytran/miniconda3/envs/TF/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py",
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/home/huytran/miniconda3/envs/TF/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py",
saving_listeners)
File "/home/huytran/miniconda3/envs/TF/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py",ec
scaffold=estimator_spec.scaffold)
File "/home/huytran/miniconda3/envs/TF/lib/python3.7/site-packages/tensorflow/python/training/basic_session_run_hooks.p
self._save_path = os.path.join(checkpoint_dir, checkpoint_basename)
File "/home/huytran/miniconda3/envs/TF/lib/python3.7/posixpath.py", line 80, in join
a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not FlagValues
execution time was 57 s.
Do you guys know how to solve this?
Just to test the implementation, I took the conll2003 data from this link https://github.com/synalp/NER/tree/master/corpus/CoNLL-2003 and then added a resource/label.vocab file in ${DATADIR}. My label.vocab file contains following entities:
I-PER
B-PER
I-LOC
B-LOC
I-MISC
B-MISC
I-ORG
B-ORG
But when I run the training script, it gives me following error:
Traceback (most recent call last):
File "run_ner.py", line 855, in
tf.app.run()
File "/home/falak/.local/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/home/falak/.local/lib/python3.5/site-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/home/falak/.local/lib/python3.5/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "run_ner.py", line 786, in main
train_features = example_converter.convert_examples_to_features(train_examples)
File "run_ner.py", line 381, in convert_examples_to_features
feature = self.convert_single_example(example, logging=(idx < 5))
File "run_ner.py", line 327, in convert_single_example
label_ids.append(self.label_map[labels[i]])
KeyError: 'O'
This error disappears if I add following additional entities in label.vocab:
O
X
<sep>
<unk>
<s>
</s>
<cls>
<sep>
<pad>
<mask>
<eod>
<eop>
So, is it expected to add these entities in label.vocab file?
Thanks for uploading good and readable code and experiment setting, result.
btw, i have question about CoQA InputFeature generation.
In inputFeature generation code,
I think that your code seems to assume that doc span has always rationale to answer Free-form answers.
That means, sometimes, when doc span has no clue to answer free-form type question, it can be labeled incorrectly.
Is it intended? or Is there anything else I haven't understood?
Thanks for your work and have a good day!
Hi, thank you so much for the code you share, I run the code for QuAC, but only get F1 score around 60, still have a large gap form the result you mentioned. I tried the same parameters you mentioned, except for batch_size, my machine can only support bs=4. Is there something more I need to pay attention to? Can you give me some advice about this? Thanks! :P
Cannot find this file. Also, what's the usage of run_embed.py?
hi, thanks for your hard work! Do you known how to add an adversarial training when finetuneing on the coqa dataset?
I wanna train a ner model without pretrained weights. I remove --init_checkpoint from the command and execute it, but I get error message "--init_checkpoint must have a value other than None".
How should I do ? thanks
I try to run on base size XLnet,128seq len,32 bsz and 2000times. but I can only get 91.3 f1 with conlleval perl version. is it right?
missing file?
I want to use pre-trained question answering model trained on coQA like hugging face. Do we have something available?
I tried using Google Colab CPU and GPU notebooks to train XLNET on COQA, but they keep crashing because of the ’out-of-memory’ issues. I tried reducing batch size to 1, but the problem still persists. Did anyone else face similar issues and was able to solve it?
I have pretrained xlnet on a large chinese corpus, but how do I run the ner.py and what is label.vocab.
Here is my parameters to train the Sentence Piece model
spm_train \
--input=data/wiki_all.txt \
--model_prefix=sp10m.cased.v3 \
--vocab_size=32000 \
--character_coverage=0.9995 \
--model_type=char \
--control_symbols='<cls>,<sep>,<pad>,<mask>,<eod>' \
--user_defined_symbols='<eop>,。' \
--shuffle_input_sentence \
--input_sentence_size=10000000
This my pretrained result.
I0708 01:51:08.929747 140337454118720 train_gpu.py:300] [99500] | gnorm 5.37 lr 0.000000 | loss 2.08 | pplx 8.01, bpc 3.0017
I0708 01:52:52.577970 140337454118720 train_gpu.py:300] [99600] | gnorm 4.98 lr 0.000000 | loss 2.03 | pplx 7.60, bpc 2.9265
I0708 01:54:36.169189 140337454118720 train_gpu.py:300] [99700] | gnorm 5.21 lr 0.000000 | loss 2.04 | pplx 7.73, bpc 2.9500
I0708 01:56:19.727979 140337454118720 train_gpu.py:300] [99800] | gnorm 5.06 lr 0.000000 | loss 2.05 | pplx 7.79, bpc 2.9625
I0708 01:58:03.187680 140337454118720 train_gpu.py:300] [99900] | gnorm 5.06 lr 0.000000 | loss 2.01 | pplx 7.47, bpc 2.9009
I0708 01:59:46.560450 140337454118720 train_gpu.py:300] [100000] | gnorm 5.51 lr 0.000000 | loss 2.00 | pplx 7.38, bpc 2.8840
So the label.vocab
should be like this ?
<cls>
<sep>
<pad>
<mask>
<eod>
B-AnatomyPart
I-AnatomyPart
B-Diagnosis
I-Diagnosis
B-Drug
I-Drug
B-Lab
I-Lab
B-Procedure
I-Procedure
B-Radiology
I-Radiology
O
Thanks for your work on xlnet extension! It is quite impressive with how quickly this has been done.
I have a question with importing xlnet package. When I tried running NER experiment, I got an error in line 753 of run_ner.py
,
saying that "AttributeError: 'module' object has no attribute 'XLNetConfig'".
Do you have any idea why xlnet.XLNetConfig
is not successfully imported here?
Thanks in advance for your answer!
Hi,
I was trying to run the NER task on a customized dataset. The training process was successful. However, when it went to evaluation and prediction step, the program stuck at INFO:tensorflow:Done running local_init_op.
and not moving forward. Is there any potential fix on this problem?
Here is the log
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
2020-04-14 20:00:54.378068: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2020-04-14 20:00:54.378117: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-14 20:00:54.378127: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2020-04-14 20:00:54.378135: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2020-04-14 20:00:54.378210: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 23077 MB memory) -> physical GPU (device: 0, name: Quadro P6000, pci bus id: 0000:02:00.0, compute capability: 6.1)
INFO:tensorflow:Restoring parameters from output/ner/i2b2/checkpoint/model.ckpt-100
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
I got OOM error when running with conll2003 dataset using my 12G memory GPU. How could I solve this problem?
ResourceExhaustedError (see above for traceback): OOM when allocating tensor of shape [32000,768] and type float [[node model/transformer/word_embedding/lookup_table/Adam/Initializer/zeros (defined at xlnet/model_utils.py:164) = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [32000,768] values: [0 0 0...]...>, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]
Hi,
Im trying to run NER task, but I think the model training is running on CPU instead of GPU. Is there any way to train on GPU, I saw the only choice in the flag is TPU on or off.
Hi steve, thank for this great repo first.
i am wondering do you add bilstm+ crf layer after xlnet for NER task?
Hi, steve,
Can you show me your GPU device (memory, number of GPU) for your QuAC results?
I got some OOM errors for reimplementing your code >_>.
ThX!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.