Comments (14)
嗨您好,感谢对本项目的关注。这个结果看着确实很奇怪。DuEE-fin的切句工具用的是一个自己写的标点切分工具,之前测试的时候没有发现有删除文本的现象,可能是哪里有bug。可以麻烦你单独使用这里的分句工具测试一下这个文档,然后对比前后的区别吗?感谢反馈!
DocEE/Data/DuEEData/build_data.py
Lines 21 to 47 in a32b6f8
from docee.
您好,我尝试直接用这段代码处理了我这段文本,他的分句是没有问题的。
不知道为什么,运行完build_data.py之后的dueefin_train_w_tgg.json文件的"sentences"属性的分句结果就有一些混乱了
有的按逗号分句,有的百分比后面的数字被删除,有的是特殊符号分句
from docee.
这个很奇怪,我测试一下。
from docee.
非常感谢!辛苦啦!
from docee.
嗨您好,是因为超出最大句长(128),所以句子后面的内容直接被删除了。
from docee.
非常感谢您找到了这个潜在的问题。这里确实会影响线下dev的评价结果,因为超出部分的arguments是不包含在内的。不过论文里的开发集结果是在相同设置下跑的,还是可以在相同设置下公平对比。最终效果还是以线上测试集结果为准。
For all researchers who see this issue, here's what happened: @miraitowa9 found the max_seq_len
is set to 128
when building DuEE-fin. This indicates the golden event arguments may be less than the real answer (if one argument appears in the cutoff part, the argument would be set to null in the golden labels).
However, since all the baselines are compared under the same setting, the trending and ranking is still reasonable. For all following researchers, I highly recommend you to submit the test2
predictions to the online evaluation platform and get the final results for real fair comparison.
Thanks again to miraitowa9 !
from docee.
非常感谢您的耐心解答!!我尝试将max_seq_len设置为256之后就没有丢失论元的情况了。我想咨询一下将max_seq_len设置大之后会影响后面模型预测的结果吗?
from docee.
为了和其它方案做公平比较,统一采用前人的设置,其它设置没有测试过。
from docee.
好的,谢谢!
from docee.
请问,您尝试过文档级事件抽取的procnet模型吗?这个模型的效果好像也还不错,您有没有想过把它集成到您的代码中呢?
from docee.
Hi, 感谢提问。我计划是长期维护这个repo,尽可能地收集更多的文档事件抽取方法,只是最近比较忙,确实时间有限。欢迎大家贡献代码~
from docee.
我觉得你的这个repo做的非常不错!!!所以推荐这个代码:https://github.com/xnyuwg/procnet
from docee.
感谢感谢,这篇工作我一直在关注,性能和结果都非常好。我会找时间加进来的,感谢推荐!
from docee.
好的,非常期待!
from docee.
Related Issues (20)
- 实验结果 HOT 4
- 相似度的一些问题 HOT 8
- 分布式训练 HOT 3
- importance分数 HOT 15
- deppn模型F1只有33 HOT 2
- "pred_results"中的classification得分 HOT 25
- Duee_Fin预测结果 HOT 2
- 测试集结果 HOT 2
- 单事件&多事件 HOT 3
- Greedy-Dec模型如何运行? HOT 6
- Evaluation Metric HOT 11
- similarity calculation HOT 1
- pretrained model weight HOT 1
- 多事件 HOT 1
- 使用o2m格式的数据时,需要修改那些代码呢 HOT 1
- Potential performance issue: plotting slow in matplotlib == 3.3.0 HOT 1
- 请问老师怎么在自己的数据集上进行训练呢? HOT 14
- 关于ptpcg论文的一些问题 HOT 4
- 论文中的一个问题 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from docee.