supercoderhawk / deep-keyphrase Goto Github PK
View Code? Open in Web Editor NEWseq2seq based keyphrase generation model sets, including copyrnn copycnn and copytransfomer
seq2seq based keyphrase generation model sets, including copyrnn copycnn and copytransfomer
在您对how to run test问题的回答中附了一段用来test的代码,您说运行copy_rnn/predict.py即可。但这个文件里没有主函数,并且在predict_kp20k.sh文件里是运行predict_runner.py。于是我把predict_runner.py的主函数里改成了
# Your model path
model_path = 'data/kp20k/copyrnn_kp20k_basic-20210318-152102/copyrnn_kp20k_basic_epoch_3_batch_1355000.model'
# your vocab path
vocab_path = 'data/vocab_kp20k.txt'
keyword_generator = CopyRnnPredictor({'model': model_path},
vocab_info=vocab_path,
beam_size=50,
max_target_len=5,
max_src_length=800)
# test some single cases, or use as component in online service
tokens = ['numerous', 'studies', 'have', 'demonstrated', 'that', 'h2o2-induced', 'apoptosis', 'is', 'mediated', 'by', 'activation', 'of', 'mapks']
keyword_generator.predict([tokens], delimiter=' ', tokenized=True)
# evaluate file
from munch import Munch
src_filename = 'data/kp20k.test.jsonl'
dest_filename = 'data/kp20k_pred.jsonl'
config = read_json('data/kp20k/copyrnn_kp20k_basic-20210318-152102/copyrnn_kp20k_basic_epoch_3_batch_1355000.json')
keyword_generator.eval_predict(src_filename, dest_filename,args=Munch(config))
这样成功得到了kp20k_pred.jsonl文件。那如何对最后的效果进行评估呢?
将控制copy_net的参数改成False后会报错
[2021-03-18 09:34:44,921] [train] destination dir:/home/yons/deep-keyphrase/data/kp20k/copyrnn_kp20k_basic-20210318-093444/ [2021-03-18 09:35:14,404] [train] exception occurred [2021-03-18 09:35:14,406] [train] Traceback (most recent call last): File "/home/yons/.virtualenvs/kpg/lib/python3.7/site-packages/deep_keyphrase/base_trainer.py", line 70, in train self.train_func() File "/home/yons/.virtualenvs/kpg/lib/python3.7/site-packages/deep_keyphrase/base_trainer.py", line 92, in train_func for batch_idx, batch in enumerate(self.train_loader): File "/home/yons/.virtualenvs/kpg/lib/python3.7/site-packages/deep_keyphrase/dataloader.py", line 55, in __iter__ return iter(KeyphraseDataIterator(self)) File "/home/yons/.virtualenvs/kpg/lib/python3.7/site-packages/deep_keyphrase/dataloader.py", line 129, in __init__ worker.start() File "/home/yons/anaconda3/lib/python3.7/multiprocessing/process.py", line 112, in start self._popen = self._Popen(self) File "/home/yons/anaconda3/lib/python3.7/multiprocessing/context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "/home/yons/anaconda3/lib/python3.7/multiprocessing/context.py", line 277, in _Popen return Popen(process_obj) File "/home/yons/anaconda3/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__ self._launch(process_obj) File "/home/yons/anaconda3/lib/python3.7/multiprocessing/popen_fork.py", line 70, in _launch self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory
请问为什么会报这个错呢?
我试了把worker_num修改成0、1,都不行。
把优化器设成SGD,把batch size设成32,把max_length设成600也不行
您好,
按照您的步骤跑通了,不过有两个疑问:
1)如何开启GPU;
2)如何进行验证test数据
AttributeError: 'Namespace' object has no attribute 'fix_batch_size'
2020-07-19 09:20:26.977419: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
[2020-07-19 09:20:30,777] [train] destination dir:../destination/json-20200719-092026/
[2020-07-19 09:21:06,466] [train] exception occurred
[2020-07-19 09:21:06,494] [train] Traceback (most recent call last):
File "D:\PycharmProject\DeepLearning\deep_keyphrase\deep_keyphrase\base_trainer.py", line 70, in train
self.train_func()
File "D:\PycharmProject\DeepLearning\deep_keyphrase\deep_keyphrase\base_trainer.py", line 92, in train_func
for batch_idx, batch in enumerate(self.train_loader):
File "D:\PycharmProject\DeepLearning\deep_keyphrase\deep_keyphrase\dataloader.py", line 55, in iter
return iter(KeyphraseDataIterator(self))
File "D:\PycharmProject\DeepLearning\deep_keyphrase\deep_keyphrase\dataloader.py", line 129, in init
worker.start()
File "C:\Software\Anaconda3\envs\tf-gpu\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\Software\Anaconda3\envs\tf-gpu\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Software\Anaconda3\envs\tf-gpu\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Software\Anaconda3\envs\tf-gpu\lib\multiprocessing\popen_spawn_win32.py", line 65, in init
reduction.dump(process_obj, to_child)
File "C:\Software\Anaconda3\envs\tf-gpu\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle generator objects
Traceback (most recent call last):
File "", line 1, in
File "C:\Software\Anaconda3\envs\tf-gpu\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Software\Anaconda3\envs\tf-gpu\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
how to run, where to put data
我遇到的这个问题,请问 有遇到的嘛
训练的时候不会有任何信息出来是吗?现在的提示信息只有[2021-03-17 16:47:03,133] [train] destination dir:/home/yons/deep-keyphrase/data/kp20k/copyrnn_kp20k_basic-20210317-164703/
是正常的吗?是等训练完再把信息全用tensorboardX可视化出来吗?
顺便,训练copyrnn的时候报错说找不到backend这个参数。我发现train.py里没有加这个参数,train_tf里加了且默认是tf,但是在训练时会需要判断模型是torch的还是tf的。于是我在train.py里加上这个参数,设为默认是torch再去训练。这样没问题吧?
您好:
pk20k数据集下载链接失效了,有心的链接么?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.