david-smejkal / wiki2txt Goto Github PK
View Code? Open in Web Editor NEWA tool to extract plain (unformatted) multilingual text, redirects, links and categories from wikipedia backups (dumps). Designed to prepare clean training data for AI training / Machine Learning software.
License: GNU General Public License v2.0