Giter Site home page Giter Site logo

supercoderhawk / deep-keyphrase Goto Github PK

View Code? Open in Web Editor NEW
50.0 3.0 8.0 165 KB

seq2seq based keyphrase generation model sets, including copyrnn copycnn and copytransfomer

Python 99.00% Shell 1.00%
seq2seq keyphrase-generation keyphrase-extraction keyword-extraction copynet pytorch

deep-keyphrase's Issues

关于最后的test与evaluate

在您对how to run test问题的回答中附了一段用来test的代码,您说运行copy_rnn/predict.py即可。但这个文件里没有主函数,并且在predict_kp20k.sh文件里是运行predict_runner.py。于是我把predict_runner.py的主函数里改成了

# Your model path
model_path = 'data/kp20k/copyrnn_kp20k_basic-20210318-152102/copyrnn_kp20k_basic_epoch_3_batch_1355000.model'
# your vocab path
vocab_path = 'data/vocab_kp20k.txt'
keyword_generator = CopyRnnPredictor({'model': model_path},
                                     vocab_info=vocab_path,
                                     beam_size=50,
                                     max_target_len=5,
                                     max_src_length=800)

# test some single cases, or use as component in online service
tokens = ['numerous', 'studies', 'have', 'demonstrated', 'that', 'h2o2-induced', 'apoptosis', 'is', 'mediated', 'by', 'activation', 'of', 'mapks']
keyword_generator.predict([tokens], delimiter=' ', tokenized=True)

# evaluate file
from munch import Munch
src_filename = 'data/kp20k.test.jsonl'
dest_filename = 'data/kp20k_pred.jsonl'
config = read_json('data/kp20k/copyrnn_kp20k_basic-20210318-152102/copyrnn_kp20k_basic_epoch_3_batch_1355000.json')
keyword_generator.eval_predict(src_filename, dest_filename,args=Munch(config))

这样成功得到了kp20k_pred.jsonl文件。那如何对最后的效果进行评估呢?

OSError: [Errno 12] Cannot allocate memory

[2021-03-18 09:34:44,921] [train] destination dir:/home/yons/deep-keyphrase/data/kp20k/copyrnn_kp20k_basic-20210318-093444/ [2021-03-18 09:35:14,404] [train] exception occurred [2021-03-18 09:35:14,406] [train] Traceback (most recent call last): File "/home/yons/.virtualenvs/kpg/lib/python3.7/site-packages/deep_keyphrase/base_trainer.py", line 70, in train self.train_func() File "/home/yons/.virtualenvs/kpg/lib/python3.7/site-packages/deep_keyphrase/base_trainer.py", line 92, in train_func for batch_idx, batch in enumerate(self.train_loader): File "/home/yons/.virtualenvs/kpg/lib/python3.7/site-packages/deep_keyphrase/dataloader.py", line 55, in __iter__ return iter(KeyphraseDataIterator(self)) File "/home/yons/.virtualenvs/kpg/lib/python3.7/site-packages/deep_keyphrase/dataloader.py", line 129, in __init__ worker.start() File "/home/yons/anaconda3/lib/python3.7/multiprocessing/process.py", line 112, in start self._popen = self._Popen(self) File "/home/yons/anaconda3/lib/python3.7/multiprocessing/context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "/home/yons/anaconda3/lib/python3.7/multiprocessing/context.py", line 277, in _Popen return Popen(process_obj) File "/home/yons/anaconda3/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__ self._launch(process_obj) File "/home/yons/anaconda3/lib/python3.7/multiprocessing/popen_fork.py", line 70, in _launch self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory

请问为什么会报这个错呢?
我试了把worker_num修改成0、1,都不行。
把优化器设成SGD,把batch size设成32,把max_length设成600也不行

How to run test?

您好,
按照您的步骤跑通了,不过有两个疑问:
1)如何开启GPU;
2)如何进行验证test数据

error

AttributeError: 'Namespace' object has no attribute 'fix_batch_size'

运行脚本出错

你好,我在运行脚本bash scripts/train_copyrnn_kp20k.sh时报错,错误如下
报内存用完错误,请问我这内存够吗? 然后我减小了批次,减少为32 结果还是报这样的错误,
请问有什么解决方法吗?感谢
image

运行train出错

2020-07-19 09:20:26.977419: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
[2020-07-19 09:20:30,777] [train] destination dir:../destination/json-20200719-092026/
[2020-07-19 09:21:06,466] [train] exception occurred
[2020-07-19 09:21:06,494] [train] Traceback (most recent call last):
File "D:\PycharmProject\DeepLearning\deep_keyphrase\deep_keyphrase\base_trainer.py", line 70, in train
self.train_func()
File "D:\PycharmProject\DeepLearning\deep_keyphrase\deep_keyphrase\base_trainer.py", line 92, in train_func
for batch_idx, batch in enumerate(self.train_loader):
File "D:\PycharmProject\DeepLearning\deep_keyphrase\deep_keyphrase\dataloader.py", line 55, in iter
return iter(KeyphraseDataIterator(self))
File "D:\PycharmProject\DeepLearning\deep_keyphrase\deep_keyphrase\dataloader.py", line 129, in init
worker.start()
File "C:\Software\Anaconda3\envs\tf-gpu\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\Software\Anaconda3\envs\tf-gpu\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Software\Anaconda3\envs\tf-gpu\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Software\Anaconda3\envs\tf-gpu\lib\multiprocessing\popen_spawn_win32.py", line 65, in init
reduction.dump(process_obj, to_child)
File "C:\Software\Anaconda3\envs\tf-gpu\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle generator objects

Traceback (most recent call last):
File "", line 1, in
File "C:\Software\Anaconda3\envs\tf-gpu\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Software\Anaconda3\envs\tf-gpu\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

请问训练一次要多久

训练的时候不会有任何信息出来是吗?现在的提示信息只有[2021-03-17 16:47:03,133] [train] destination dir:/home/yons/deep-keyphrase/data/kp20k/copyrnn_kp20k_basic-20210317-164703/是正常的吗?是等训练完再把信息全用tensorboardX可视化出来吗?

顺便,训练copyrnn的时候报错说找不到backend这个参数。我发现train.py里没有加这个参数,train_tf里加了且默认是tf,但是在训练时会需要判断模型是torch的还是tf的。于是我在train.py里加上这个参数,设为默认是torch再去训练。这样没问题吧?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.