drcknowledgeteam / drcd Goto Github PK

View Code? Open in Web Editor NEW

305.0 305.0 48.0 6.71 MB

A 30000+ Chinese MRC dataset - Delta Reading Comprehension Dataset

drcd's People

Contributors

Stargazers

Watchers

Forkers

troisliu david30907d coddinglxf huntzhan chesterkuo chienjchienj ryanchao2012 aiedward cfsmile sumhncku allensmile sinlin0908 kevin23916 colinsongf ldruth28 kyang888 johnnyjana730 ntuaha chia-hsuan-lee ag027592 wangholung markhsia hakanaku1234 ntusna whyang9701 sebastien911 loserking juilin hongyunnchen askintution cjhsu1991 sanshanxiashi xrosliang xhsun1997 tzuyichao chewkokwah viola-yuan noe hkbu-kennycheng yingying123321 haojiepan1 guaguawudi forex24 dot1209 andysucao yelban

drcd's Issues

Evaluation problem

According to the paper:

F1 score and exact match from Rajpurkar et al. (2016) are used as the evaluation metrics. Both metrics ignore punctuations. In F1 score metric, we consider predictions and ground truth as bag of Chinese character.

Is the ignored punctuations including fullwidth? Is there a original evaluation script for this dataset?

The dev set answers are duplicate in the same question.
Also, in SQuAD dataset, it has 3 different answers in dev set and test set, so the human performance is much higher than your dataset in EM performance. Are you going to provide more answer?

想问一下怎么样才能获取全部的数据集呢

是要发邮件来申请全部的数据集嘛？

Test set

What's current F1 score or EM score for test set if Chinese MRC dataset here ??

Training set problem

In the paper, it said that "the training set contains 26,932 questions in 8,014 paragraphs".
However, after calculating I found that I got 26936 question's id in the json file.

Format conversion to SQuAD2.0

When you used Bert-Chinese model to do the DRCD tasks like your paper told us, is there anything such as format conversion that we need to do first, and then we can use Bert-Chinese model to do DRCD tasks ?

p.s. Format conversion means that convert DRCD format to SQuAD2.0 format.

drcknowledgeteam / drcd Goto Github PK

drcd's People

Contributors

Stargazers

Watchers

Forkers

drcd's Issues

Evaluation problem

Dev problem

想问一下怎么样才能获取全部的数据集呢

Test set

Training set problem

Format conversion to SQuAD2.0

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent