thunlp / hmeae Goto Github PK
View Code? Open in Web Editor NEWSource code for EMNLP-IJCNLP 2019 paper "HMEAE: Hierarchical Modular Event Argument Extraction".
License: MIT License
Source code for EMNLP-IJCNLP 2019 paper "HMEAE: Hierarchical Modular Event Argument Extraction".
License: MIT License
if you use other version , it doesn't work.
only support stanford-corenlp-full-2018-10-05.
Thank you for releasing the code for HMEAE.
My question about TAC KBP 2016 dataset is:
Thank you and looking forward to your reply.
C:\Anaconda3\envs\tensorflow\python.exe F:/HMEAE-master/train.py --mode DMCNN
WARNING:tensorflow:From F:/HMEAE-master/train.py:26: The name tf.app.run is deprecated. Please use tf.compat.v1.app.run instead.
--File Extraction Finish--
--Entity Extraction Finish--
Traceback (most recent call last):
File "F:/HMEAE-master/train.py", line 26, in
tf.app.run()
File "C:\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\platform\app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "C:\Anaconda3\envs\tensorflow\lib\site-packages\absl\app.py", line 299, in run
_run_main(main, args)
File "C:\Anaconda3\envs\tensorflow\lib\site-packages\absl\app.py", line 250, in _run_main
sys.exit(main(argv))
File "F:/HMEAE-master/train.py", line 15, in main
extractor.Extract()
File "F:\HMEAE-master\utils.py", line 388, in Extract
self.Event_Extract()
File "F:\HMEAE-master\utils.py", line 144, in Event_Extract
nlp = StanfordCoreNLPv2(constant.corenlp_path)
File "F:\HMEAE-master\utils.py", line 438, in init
super(StanfordCoreNLPv2,self).init(path)
File "C:\Anaconda3\envs\tensorflow\lib\site-packages\stanfordcorenlp\corenlp.py", line 46, in init
if not subprocess.call(['java', '-version'], stdout=subprocess.PIPE, stderr=subprocess.STDOUT) == 0:
File "C:\Anaconda3\envs\tensorflow\lib\subprocess.py", line 287, in call
with Popen(*popenargs, **kwargs) as p:
File "C:\Anaconda3\envs\tensorflow\lib\subprocess.py", line 729, in init
restore_signals, start_new_session)
File "C:\Anaconda3\envs\tensorflow\lib\subprocess.py", line 1017, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] 系统找不到指定的文件。
Traceback (most recent call last):
File "D:/pythonProject/HMEAE-master/train.py", line 28, in
tf.app.run()
File "D:\anaconda2020\lib\site-packages\tensorflow\python\platform\app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "D:\anaconda2020\lib\site-packages\absl\app.py", line 303, in run
_run_main(main, args)
File "D:\anaconda2020\lib\site-packages\absl\app.py", line 251, in _run_main
sys.exit(main(argv))
File "D:/pythonProject/HMEAE-master/train.py", line 17, in main
extractor.Extract()
File "D:\pythonProject\HMEAE-master\utils.py", line 388, in Extract
self.Event_Extract()
File "D:\pythonProject\HMEAE-master\utils.py", line 183, in Event_Extract
tokens,offsets = nlp.word_tokenize(sent,True)
File "D:\anaconda2020\lib\site-packages\stanfordcorenlp\corenlp.py", line 173, in word_tokenize
r_dict = self._request('ssplit,tokenize', sentence)
File "D:\anaconda2020\lib\site-packages\stanfordcorenlp\corenlp.py", line 239, in request
r_dict = json.loads(r.text)
File "D:\anaconda2020\lib\json_init.py", line 357, in loads
return _default_decoder.decode(s)
File "D:\anaconda2020\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "D:\anaconda2020\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
非常感谢您的分享,关于f_score这里我有点疑惑,恳请您的解答~
我想通过举例来说明一下,假设有这些真实的标签,第一个数字为事件类型标签,第二个为角色标签如下:
(2,1),(2,3),(0,0),(5,3),
预测结果为:
(2,1),(2,4),(0,5),(5,0)
按照代码中的计算方式,我得到TP=1,FN=3, FP=1,这样R=1/4,P=1/2
我之前复现这方面的论文的计算方式为,R=1/3 即真实的需要预测的有(2,1),(2,3),(5,3),所以Positive样本=3,R=1/3;P=1/3,预测出了3个标签,只对了一个,所以P为1/3。
我想的是测试集中的正样本似乎应该是固定数量的,所以positive样本数量应该不变,即fn+tp应该固定。
这样计算似乎与您的有些不同,请问一下是我哪里计算错了吗?希望得到您的答复,谢谢!
Thank you very much for releasing the source code.
I noticed that code of DMCNN and HMEAE(DMCNN) is released while DMBERT and HMEAE(BERT) are missing. Can you release code for the two models?
Thank you.
The issue happens in utils.py line 183
I check the source code of corenlp, the function of word_tokenize() only has two arguments that are self and sent, i don't know where the boolean argument come from.
If possible, could you please help me to solve this problem? Or provide the processed ACE data?
Thanks a lot!
> --File Extraction Finish--
> --Entity Extraction Finish--
> Traceback (most recent call last):
> File "train.py", line 26, in <module>
> tf.app.run()
> File "/home/zhangmingyu/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
> _sys.exit(main(argv))
> File "train.py", line 15, in main
> extractor.Extract()
> File "/home/zhangmingyu/HMEAE-master/utils.py", line 388, in Extract
> self.Event_Extract()
> File "/home/zhangmingyu/HMEAE-master/utils.py", line 203, in Event_Extract
> entity_start = entity_offsets[0][0]
> IndexError: list index out of range
>
I don't have the ACE2005 dataset,so I don't know the exact format of your data input,So could you please introduce your input format to me, so that I can convert my data set into your input format,specifically speaking, t_data, a_data, loader.maxlen, loader.max_argument_len, loader.wordemb,What are the specific forms of these inputs
Hi, I have glanced through your paper and found the final output is the role type involved in the input sentence rather than the role type of candidate entity. Because you compress/encode the input embeddings to a sentence representation/embedding, then concatenate it with the role-oriented embedding before using a softmax to get the estimated role type.
Besides, what if there are no explicit argument roles in the daily text? I mean, you can't get labeled sentence in testset, so you don't know which role types contained in the input sentence. How could I calculate the role-oriented embedding?
Thank you for your explanation.
Thank you very much for releasing source code about this paper.
However, I notice you used func/f_score to calculate argument detection performance, which basically consider if predicted roles and gold roles match. The event types is ignored in evaluation. I think there is something wrong considering the criteria is as follows:
An argument is correctly classified if its offsets, role, related trigger type and trigger’s offsets exactly match a reference argument.
There are some cases you probably miss:
Correct me if I'm wrong and thank you again.
Thank you for releasing the source code.
I noticed that DMBERT has a special token to indicate the event type when detecting arguments.
To utilize the event type information in our model, we append a special token into each input sequence for BERT to indicate the event type.
Could you give me more details about the operation? Maybe an example is helpful. Take attack event for example, the input may look like the following:
[CLS] [Token1] [Token2] [Token3] [Token4]...[Token 128] [SEP] [ATTACK]
What is the special token? Like [Attack]、#ATTACK#
If the special token doesn't exist in Bert's vocab file, how do you initialize the representation for the token?
Thank you and look forward to your reply.
Thank you for releasing the source code for this paper.
I run the code with the following commands, which I think is correct:
python train.py --gpu 1 --mode DMCNN
python train.py --gpu 1 --mode HMEAE
The performance is not as high as the paper reported, I get the following performance for DMCNN:
test best Precision: 0.5258620689655172 test best Recall:0.48348745046235136 test best F1:0.5037852718513421
And performance of HMEAE(CNN) is even worse.
I guess maybe it's because of the random data split.
other_files = [file for dir in self.dirs for file in self.source_files[dir] if dir!='nw']+nw[40:] random.shuffle(other_files) random.shuffle(other_files) test_files = nw[:40] dev_files = other_files[:30] train_files = other_files[30:]
Could you please release or email me([email protected]) the data split you used for experiment (I mean the three files: train.json/test.json/det.json). I have ACE data and license for it.
Thank you very much
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.