Delta Chinese QA 邁向中文問答之路
Input: A short paragraph and a question Output: A segment of paragraph
Git add ignore large files(save your own training data)
find . -size +90M | sed 's|^\./||g' | cat >> .gitignore; awk '!NF || !seen[$0]++' .gitignore
Jieba note
https://github.com/ldkrsi/jieba-zh_TW
Pre-trained wordvec
https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md
Kaggle
https://www.kaggle.com/c/ml-2017fall-final-chinese-qa/data
ppt
The evaluation metric for this competition is Mean F1-Score. The F1 score, commonly used in information retrieval, measures accuracy using the statistics precision p and recall r. Precision is the ratio of true positives (tp) to all predicted positives (tp + fp). Recall is the ratio of true positives to all actual positives (tp + fn). The F1 score is given by:
F1=2p⋅r/(p+r) where p=tp/(tp+fp), r=tp/(tp+fn)
The F1 metric weights recall and precision equally, and a good retrieval algorithm will maximize both precision and recall simultaneously. Thus, moderately good performance on both will be favored over extremely good performance on one and poor performance on the other.
Explain how to run the automated tests for this system
version : String Data : Array title : String paragraphs : Array context : String qas : Array question : String id : uuid answers : Arrays answer_start : int text : string
seq2seq
http://blog.csdn.net/jerr__y/article/details/53749693
(ML 2017)
http://speech.ee.ntu.edu.tw/~tlkagk/courses/ML_2016/Lecture/RNN%20(v2).pdf http://cyruschiu.github.io/2017/02/24/learning-Tensoflow-Seq2Seq-for-translate/
Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.
- Billie Thompson - Initial work - PurpleBooth
See also the list of contributors who participated in this project.
This project is licensed under the MIT License - see the LICENSE.md file for details
- Hat tip to anyone who's code was used
- Inspiration
- etc