joe32140 / flatten_tokenize_convert_chinese_gigaword Goto Github PK
View Code? Open in Web Editor NEWThis project forked from nelson-liu/flatten_gigaword
Dump the text of the Gigaword dataset into headline and paragraph files including Chinese word tokenization and simplified-to-tranditional Chinese conversion
License: MIT License